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Preface 


The implicit function theorem is, along with its close cousin the inverse func- 
tion theorem, one of the most important, and one of the oldest, paradigms in 
modern mathematics. One can see the germ of the idea for the implicit func- 
tion theorem in the writings of Isaac Newton (1642-1727), and Gottfried Leib- 
niz’s (1646-1716) work explicitly contains an instance of implicit differentiation. 
While Joseph Louis Lagrange (1736-1813) found a theorem that is essentially a 
version of the inverse function theorem, it was Augustin-Louis Cauchy (1789- 
1857) who approached the implicit function theorem with mathematical rigor and 
it is he who is generally acknowledged as the discoverer of the theorem. In Chap- 
ter 2, we will give details of the contributions of Newton, Lagrange, and Cauchy 
to the development of the implicit function theorem. 

The form of the implicit function theorem has evolved. The theorem first was 
formulated in terms of complex analysis and complex power series. As interest 
in, and understanding of, real analysis grew, the real-variable form of the theorem 
emerged. First the implicit function theorem was formulated for functions of two 
real variables, and the hypothesis corresponding to the Jacobian matrix being non- 
singular was simply that one partial derivative was nonvanishing. Finally, Ulisse 
Dini (1845-1918) generalized the real-variable version of the implicit function 
theorem to the context of functions of any number of real variables. As math- 
ematicians understood the theorem better, alternative proofs emerged, and the 
associated modern techniques have allowed a wealth of generalizations of the 
implicit function theorem to be developed. 

Today we understand the implicit function theorem to be an ansatz, or a way 
of looking at problems. There are implicit function theorems, inverse function 
theorems, rank theorems, and many other variants. These theorems are valid on 
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Euclidean spaces, manifolds, Banach spaces, and even more general settings. 
Roughly speaking, the implicit function theorem is a device for solving equations, 
and these equations can live in many different settings. 

In addition, the theorem is valid in many categories. The textbook formula- 
tion of the implicit function theorem is for C * functions. But in fact the result is 
true for C*” functions, Lipschitz functions, real analytic functions, holomorphic 
functions, functions in Gevrey classes, and for many other classes as well. The 
literature is rather opaque when it comes to these important variants, and a part of 
the present work will be to set the record straight. 

Certainly one of the most powerful forms of the implicit function theorem is 
that which is attributed to John Nash (1928— ) and Jiirgen Moser (1928-1999). 
This device is actually an infinite iteration scheme of implicit function theorems. 
It was first used by John Nash to prove his celebrated imbedding theorem for 
Riemannian manifolds. Jiirgen Moser isolated the technique and turned it into a 
powerful tool that is now part of partial differential equations, functional analysis, 
several complex variables, and many other fields as well. This text will culminate 
with a version of the Nash—Moser theorem, complete with proof. 

This book is one both of theory and practice. We intend to present a great many 
variants of the implicit function theorem, complete with proofs. Even the impor- 
tant implicit function theorem for real analytic functions is rather difficult to pry 
out of the literature. We intend this book to be a convenient reference for all such 
questions, but we also intend to provide a compendium of examples and of tech- 
niques. There are applications to algebra, differential geometry, manifold theory, 
differential topology, functional analysis, fixed point theory, partial differential 
equations, and to many other branches of mathematics. One learns mathematics 
(in part) by watching others do it. We hope to set a suitable example for those 
wishing to learn the implicit function theorem. 

The book should be of interest to advanced undergraduates, graduate students, 
and professional mathematicians. Prerequisites are few. It is not necessary that 
the reader be already acquainted with the implicit function theorem. Indeed, the 
first chapter provides motivation and examples that should make clear the form 
and function of the implicit function theorem. A bit of knowledge of multivari- 
able calculus will allow the reader to tackle the elementary proofs of the implicit 
function theorem given in Chapter 3. Rudiments of real and functional analysis are 
needed for the third proof in Chapter 3 which uses the Contraction Mapping Fixed 
Point Principle. Some knowledge of complex analysis is required for a complete 
reading of the historical material—this seems to be unavoidable since the earliest 
rigorous work on the implicit function theorem was formulated in the context of 
complex variables. In many cases a willing suspension of disbelief and a bit of 
determination will serve as a thorough grounding in the basics. 

There are many sophisticated applications of implicit function theorems, partic- 
ularly the Nash—Moser theorem, in modern mathematics. The imbedding theorem 
for Riemannian manifolds, the imbedding theorem for CR manifolds, and the de- 
formation theory of complex structures are just a few of them. Richard Hamilton’s 
masterful survey paper (see the Bibliography) indicates several more applications 
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from different parts of mathematics. While each of these is a lovely tour de force 
of modern analytical technique, it is also the case that each requires considerable 
technical background. In order to keep the present volume as self-contained as 
possible, we have decided not to include any of these modern applications; in- 
stead we have provided exclusively classical applications of the implicit function 
theorem. For a basic book on the subject, we have found this choice to be most 
propitious. 

We intend this book to be a useful resource for scientists of all types. We have 
exerted a considerable effort to make the bibliography extensive (if not complete). 
Therefore topics that can only be touched on here can be amplified with further 
reading. Although there are no formal exercises, the extensive remarks provide 
grist for further thought and calculation. We trust that our exposition will imbue 
our readers with some of the same fascination that led to the writing of this book. 

There are a number of people whom we are pleased to thank for their helpful 
comments and contributions: David Barrett, Michael Crandall, John P. D’ Angelo, 
Gerald B. Folland, Judith Grabiner, Robert E. Greene, Lars Hérmander, Seth 
Howell, Kang-Tae Kim, Laszlo Lempert, Maurizio Letizia, Richard Rochberg, 
Walter Rudin, Steven Weintraub, Dean Wills, Hung-Hsi Wu. Robert Burckel cast 
his critical eye on every page of our manuscript and the result is a much cleaner 
and more accurate book. Librarian Barbara Luszczynska performed yeoman ser- 
vice in helping us to track down references. This book is better because of the 
friendly assistance of all these good people; but, of course, all remaining failings 
are the province of the authors. 


Washington University, St. Louis Steven G. Krantz 
Oregon State University, Corvallis Harold R. Parks 
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Introduction to the Implicit 
Function Theorem 


1.1 Implicit Functions 


To the beginning student of calculus, a function is given by an analytic expression 
such as 


f(x) =x? 42x? —x -3, (1.1) 


syy=yy+l, (1.2) 


or 
h(t) = cos(27t). (1.3) 


In fact, 250 years ago this was the approach taken by Léonard Euler (1707-1783) 
when he wrote (see Euler [EB 88)): 


A function of a variable quantity is an analytic expression composed 
in any way whatsoever of the variable quantity and numbers or con- 
stant quantities. 


Almost immediately, one finds the notion of “function as given by a formula” 
to be too limited for the purposes of calculus. For example, the locus of 


y> + l6y — 32x7 + 32x =0 (1.4) 
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y 


Figure 1.1. The Locus of Points Satisfying (1.4) 


defines the nice subset of R? that is sketched in Figure 1.1. The figure leads us to 
suspect that the locus is the graph of y as a function of x, but no formula for that 
function exists. 

In contrast to the naive definition of functions as formulas, the modern, set- 
theoretic definition of a function is formulated in terms of the graph of the func- 
tion. Precisely, a function with domain X and codomain or range Y is a subset, 
let us call it f, of the cartesian product 


XxY={(x,y):xEX, ye Y} 


having the properties that (i) for each x € X there is an element (x, y) € f, and 
(ii) if (x, y) € f and (x, ¥) € f, then y = y. In case these two properties hold, 
the choice of x € X determines the unique y € Y for which (x, y) € f; because 
of this uniqueness, we find it a convenient shorthand to write 


y= F(x) 
to mean that (x, y) € f. 


Example 1.1.1 The locus defined by (1.4) has the property that, for each choice 
of x € R, there is a unique y € R such that the pair (x, y) satisfies the equation. 
Thus there is a function, f, in the modern sense, such that the graph y = f(x) is 
the locus of (1.4). 
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To confirm this assertion, we fix a value of x € R and consider the left-hand 
side of (1.4) as a function of y alone. That is, we will examine the behavior of 


F(y)=y?+ l6y — 32x? + 32x 


with x fixed. 
Since the powers of y in F(y) are odd, we have limy—.—o0 F(y) = —oo and 
limy+o0 F(y) = +00. Also we have 


F'(y) = 5y4+16>0, 


so F(y) is strictly increasing as y increases. By the intermediate value theorem, 
we see that F(y) attains the value O for a unique value of y. That value of y is the 
value of f(x) for the fixed value of x under consideration. O 


Note that it is not clear from (1.4) by itself that y is a function of x. Only by 
doing the extra work in the example can we be certain that y really is uniquely 
defined as a function of x. Because it is not immediately clear from the defin- 
ing equation that a function has been given, we say that the function is defined 
implicitly by (1.4). In contrast, when we see 


y = f(x) (1.5) 


written, we then take it as a hypothesis that f(x) is a function of x; no additional 
verification is required, even when in the right-hand side the function is simply 
a symbolic representation as in (1.5) rather than a formula as in (1.1), (1.2), and 
(1.3). To distinguish them from implicitly defined functions, the functions in (1.1), 
(1.2), (1.3), and (1.5) are called (in this book) explicit functions. 


1.2 An Informal Version of the Implicit 
Function Theorem 


Thinking heuristically, one usually expects that one equation in one variable 
F(x)=c, 


c aconstant, will be sufficient to determine the value of x (though the existence 
of more than one, but only finitely many, solutions would come as no surprise).! 
When there are two variables, one expects that it will take two simultaneous equa- 
tions 


F(x, y) Cc, 
G(x,y) = d, 


1 What we are doing is informally describing the notion of “degrees of freedom” that is commonly 
used in physics. 
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c and d constants, to determine the values of both x and y. In general, one expects 
that a system of #1 equations in m variables 


Fi (x1,X2,-.-.%Xm) = C1, 
Fo(x1,X2,---,Xm) = C2, 

- (1.6) 
Fin (X15 X25 «oes Xm) _ Cm > 


C1,C2 ,..-, Cm Constants, will be just the right number of equations to determine 
the values of the variables. But of course we must beware of redundancies among 
the equations. That is, we must check that the system is nondegenerate—in the 
sense that a certain determinant does not vanish. 

In case the equations in (1.6) are all linear, we can appeal to linear algebra to 
make our heuristic thinking precise (see any linear algebra textbook): A necessary 
and sufficient condition to guarantee that (1.6) has a unique solution for all values 
of the constants c; is that the matrix of coefficients of the linear system has rank 
mn. 

We continue to think heuristically: If there are more variables than equations in 
our system of simultaneous equations, say 


FQ Xds25.4%n) SS <e]s 
F2(x1, X2,---Xn) = C2, 

(1.7) 
Fin (X1, X25 .- +5 Xn) = Cm >» 


where the c’s are still constants and where n > m, then we would hope to treat 
those n — m extra variables as parameters, thereby forcing m of the variables to be 
implicit functions of the » — m parameters. Again, in the case of linear functions, 
the situation is well understood: As long as the matrix of coefficients has rank m, 
it will be possible to express some set of m of the variables as functions of the 
Other #2 — m variables. Moreover, for any set of m independent columns of the 
matrix of coefficients of the linear system, the corresponding m variables can be 
expressed as functions of the other variables. 

In the general case, as opposed to the linear case, the system of equations (1.7) 
defines a completely arbitrary subset of R” (an arbitrary closed subset if the func- 
tions are continuous). Only under special conditions will (1.7) define m of the 
variables to be implicit functions of the other n — m variables. It is the purpose of 
the implicit function theorem to provide us with a powerful method, or collection 
of methods, for insuring that we are in one of those special situations for which 
the heuristic argument is correct. 

The implicit function theorem is grounded in differential calculus; and the 
bedrock of differential calculus is linear approximation. Accordingly, one works 
in a neighborhood of a point (p1, p2,---. Pn), where the equations in (1.7) all 
hold at (pi, p2,.--, Pa) and where the functions in (1.7) can all be linearly ap- 
proximated by their differentials. We are now in a position to state the implicit 
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function theorem in informal terms (we shall give a more formal enunciation 
later): 


(Informal) Implicit Function Theorem Let the functions in (1.7) be 
continuously differentiable. If (1.7) holds at (p,, p2,.-.+. Pn) and if, 
when the functions in (1.7) are replaced by their linear approxima- 
tions, a particular set of m variables can be expressed as functions of 
the other n — m variables, then, for (1.7) itself, the same m variables 
can be defined to be implicit functions of the other n —m variables in 
a neighborhood of (p\, p2,..-, Pn). Additionally, the resulting im- 
plicit functions are continuously differentiable and their derivatives 
can be computed by implicit differentiation using the familiar method 
learned as part of the calculus. 


Let us look at a very simple example in which there is only one, well-understood, 
equation in two variables. We will treat this example in detail for the benefit of 
the reader who is not already comfortable with the ideas we have been discussing. 


Example 1.2.1 Consider 
x?oy?= 1. (1.8) 


The locus of points defined by (1.8) is the circle of radius 1 centered at the origin. 
Of course, in a suitable neighborhood of any point P = (p,q) satisfying (1.8) 
and for which g # 0, we can solve the equation to express y explicitly as 


y=+V1—-x?2, 


where the choice of + or — is dictated by whether q is positive or negative. (Like- 
wise, we could just as easily have dealt with the case in which p # 0 by solving 
for x as an explicit function of y.) 

The usefulness of the implicit function theorem stems from the fact that we 
can avoid explicitly solving the equation. To take the point of view of the implicit 
function theorem, we linearly approximate the left-hand side of (1.8). In a neigh- 
borhood of a point P = (p,q), a continuously differentiable function F(x, y) is 
linearly approximated by 

aAx+bAyt+e, 


where a is the value of F/dx evaluated at P, Ax is the change in x made in 
going from P = (p,q) to the point (x, y), b is the value of 0F/dy evaluated at 
P, Ay is the change in y made in going from P = (p, q) to the point (x, y), and 
c is the value of F at P. In this example, F(x, y) = x? + y’, the left-hand side of 
(1.8). 

We compute 


=2p 


rs) 
falar (x? 4 y”) 
(x. y)=(p.g) 


Ox 
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and 


(x. y)=(p.9) 


Thus, in a neighborhood of the point P = (p,q) which satisfies (1.8), the left- 
hand side of (1.8) is linearly approximated by 


(2p) (x — p) + (2g) (y—q) +1 = 2px + 2qy —1. 


When we replace the left-hand side of (1.8) by its linear approximation and sim- 
plify we obtain 


px+qy=1, (1.9) 


which, of course, is the equation of the tangent line to the circle at the point P. 
The implicit function theorem tells us that whenever we can solve the approx- 
imating linear equation (1.9) for y as a function of x, then the original equation 
(1.8) defines y implicitly as a function of x. Clearly, we can solve (1.9) for y as 
a function of x exactly when g # 0, so it is in this case that the implicit function 
theorem guarantees that (1.8) defines y as an implicit function of x. This agrees 
perfectly with what we found when we solved the equation explicitly. O 


Remark 1.2.2 Looking at the circle, we see that it is impossible to use (1.8) to 
define y as a function of x in any open interval around x = 1 or in any open 
interval around x = —1. For other equations, an implicit function may happen to 
exist in a neighborhood of a point at which the implicit function theorem does not 
apply but, in such a case, the function may or may not be differentiable. 


An example in which there are three variables and two equations will serve to 
illustrate the connection between linear algebra and the implicit function theorem. 


Example 1.2.3 Fix R > /2 and consider the pair of equations 


x*+y*+27 = R?, 
i (1.10) 
near the point P = (1, 1, 9), where p = / R* —2. 
We could solve the system explicitly. But it is instructive to instead take the 
point of view of the implicit function theorem. There are three variables and two 
equations, so the heuristic argument above tells us to expect two variables to be 
implicit functions of the third. 
Computing partial derivatives and evaluating at (1, 1, 0) to linearly approxi- 
mate the functions in (1.10), we obtain the equations 


x+ty+tpz = 2+ 97, 


xty = 2. (1.11) 
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This system of equations is the linearization of the original system. The first equa- 
tion in (1.11) defines the tangent plane at P of the locus defined by the first equa- 
tion in (1.10) and the second equation in (1.11) defines the tangent plane at the 
same point of the locus defined by the second equation in (1.10). Clearly, the two 
tangent planes have a non-trivial intersection because both automatically contain 
the point P. 

The requirement that needs to be verified before the implicit function theorem 
can be applied is that we can solve the linear system (1.11) for two of the variables 
as a function of the third. Geometrically, this corresponds to showing that the 
intersection of the tangent planes is a line, because it is along a line in R? that two 
of the variables can be expressed as a function of the third. 

We now appeal to linear algebra. The matrix of coefficients for the linear system 


is 
_{!i 1 ep 
p=(iis) 


The necessary and sufficient condition for being able to solve (1.11) for two of 
the variables as a function of the third is that D have rank 2. Clearly, the rank of 
D is 2 if and only if o # 0. Thus, when R > 2, the implicit function theorem 
then guarantees that some pair of the variables can be defined implicitly in terms 
of the remaining variable. 

On the other hand, when p = 0, or equivalently when R = J/2, the rank of D 
is 1 and the implicit function theorem does not apply. Not only does the implicit 
function theorem not apply, but it is easy to see that (1, 1,0) and (—1, —1, 0) are 
the only solutions of (1.10). 

Assume now that p # 0. The implicit function theorem tells us that if we can 
solve the linear system (1.11) for a particular pair of the variables in terms of 
the third, then the original system of equations defines the same two variables as 
implicit functions of the third near (1, 1, o). To determine which pairs of variables 
are functions of the third, we again appeal to linear algebra. Any two independent 
columns of D will correspond to variables in (1.11) that can be expressed as 
functions of the third. Thus, the implicit function theorem gives us the pair x(y) 
and z(y) satisfying (1.10), or the pair y(x) and z(x) satisfying (1.10). 

In this example, not only does the implicit function theorem not allow us to 
assert the existence of x(z) and y(z) satisfying (1.10), but no such functions exist. 

O 


1.3. The Implicit Function Theorem Paradigm 


In the last section, we described the heuristic thinking behind the implicit func- 
tion theorem and stated the theorem in informal terms. Even though the heuristic 
argument behind the result is rather simple, the implicit function theorem is a fun- 
damental and powerful part of the foundation of modern mathematics. Originally 
conceived over two hundred years ago as a tool for studying celestial mechanics 
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(see also Section 2.3), the implicit function theorem now has many formulations 
and is used in many parts of mathematics. Virtually every category of functions 
has its own special version of the implicit function theorem, and there are par- 
ticular versions adapted to Banach spaces, algebraic geometry, various types of 
geometrically degenerate situations, and to functions that are not even smooth. 
Some of these are quite sophisticated, and have been used in startling ways to 
solve important open problems (the imbedding problem for Riemannian mani- 
folds and the imbedding problem for CR manifolds are just two of them). 


The implicit function theorem paradigm: Given three topological 
spaces X, Y, Z (these spaces need not be distinct), a continuous func- 
tion F : X x Y — Z, and points Xo € X, Yo € Y, Zo € DZ such 
that 
F (Xo, Yo) = Zo, 

an implicit function theorem must describe an appropriate nonde- 
generacy condition on F at (Xo, Yo) sufficient to imply the existence 
of neighborhoods U of Xo in X, V of Yo in Y, and of a function 
F :U — V satisfying the following two conditions: 


F(Xo) = Yo, 
F(X, F(X)] = Zo, forall XeU. 


Additionally, an implicit function theorem will entail the conclusion 
that the function F is well behaved in some appropriate sense, and 
it is usually an important part of the theorem that F is the unique 
function satisfying (1.12). 


(1.12) 


The simplest case of the above paradigm is to let all three of the topological 
spaces be the real numbers R. The function F is assumed to be continuously 
differentiable and the nondegeneracy condition is the nonvanishing of the partial 
derivative with respect to Y. We now state the result formally as a theorem. 


Theorem 1.3.1 Let ¥ be a real-valued continuously differentiable function 


defined in a neighborhood of (Xo, Yo) € R* Suppose that F satisfies the two 
conditions 


F (Xo, Yo) = Zo, 
oF ix Y 0 
ay (Xo 0) #0. 


Then there exist open intervals U and V, with Xo € U, Yo € V, and a unique 
function F : U — V satisfying 


F(X, F(X)] = Zo, forall XEU, 


and this function F is continuously differentiable with 


dY aF aF 
ax (to) = F'(Yo) = — Sy. | /| F700. ) (1.13) 
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Because this theorem involves partial derivatives, the theorem per Se is not usually 
taught in a first calculus course. Instead, a disguised form of Equation (1.13) is 
taught: The student is told to go ahead and differentiate F(X, ¥) = Zo with 
respect to X using the chain rule and assuming that dY/dX exists. If it is then 
possible to solve for dY/dX when X = Xo and Y = Yo, the student is assured 
that the result is correct (as the theorem in fact guarantees). This somewhat ad hoc 
process is called implicit differentiation. Once the beginning student of calculus 
has learned about partial differentiation, Theorem 1.3.1 is likely to be the first 
version of the implicit function theorem presented. 

By approaching this basic freshman calculus version of the implicit function 
theorem via the paradigm, we see that a natural generalization would arise by 
replacing R by C (that generalization is stated and proved in Section 2.4). In fact, 
there is no limit to the number of variations that can be made on this theme by 
altering the choice of topological spaces, or the category of functions considered, 
or the type of nondegeneracy conditions used, or the conclusions about what is a 
“well behaved” implicit function. 

Acorollary of Theorem 1.3.1 is obtained by setting 


F(X,Y) =X —-G(Y), 


with G : R — R a continuously differentiable function. The nondegeneracy 
requirement becomes G’(Yo) # 0. Taking Zp = O and assuming Xo = G(Yo), 
Theorem 1.3.1 guarantees the existence of a function F satisfying 


G[F(X)J =X, 
that is, F is the inverse function to G. We also conclude that 
F’(Xo) = 1/G"(Yo) . 


This result is the inverse function theorem taught in freshman calculus. 

Both the implicit function theorem and the inverse function theorem might be 
proved in an honors course in calculus, but most students will first see the proofs in 
a course on advanced calculus. Nonetheless, a student will probably never really 
apply the theorems until more advanced mathematical work. 


Example 1.3.2 Consider the equation 
x =y-—esin(y), (1.14) 


where € is a small constant. While the notation we are using is different, (1.14) has 
the same form as Kepler’s equation in celestial mechanics. A classical problem 
was to solve (1.14) for y as a function of x, that is, to find the inverse function. 
This cannot be done in closed form using elementary functions, but a positive 
result can be obtained using infinite series. The resulting formula is known as the 
Lagrange inversion theorem. All of this is discussed in more detail in Section 2.3. 
Here we note that 


mat —esin(y)| = 1 —ecos(y) #0 
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holds, provided |e] < 1. Thus, the simple freshman calculus form of the inverse 
function theorem described above applies. O 


In general, the implicit function theorem and the inverse function theorem can 
be thought of as equivalent, companion formulations of the same basic idea. In 
any particular context, one may find it easier to take one approach or the other. 

To continue our more formal presentation of the the implicit function theo- 
rem, we give a simple, if typical, formulation of the theorem. For convenience 
in this rather elementary introduction, we state the result in R>. Be assured that 
the implicit function theorem is true in any dimensional space—even in infinite 
dimensional spaces. 


Theorem 1.3.3 We let U C R? be an open set and we assume that 
F=(Fi,F2):U > R 


is a continuously differentiable function. Further assune that, at a point a = 
(a1, a2, a3) € U, it holds that F(a) = O and 


OF, dF; 
0x2 = 0x3 
- 0. 
et dF2 dF * 
0x2 = 9x3 


Then there is a product neighborhood V x W C U, witha, € V © Rand 
(a2,a3) € W C R*, and a unique, continuously differentiable mapping 


F =(F,, Fo): V > W 
such that (a2, a3) = F(a) and, for each x; € V, it holds that 
Fix, Fix), Fo(x1)] = 0. 


Again, we will not prove this result here, but refer the reader to Section 3.3. This 
theorem applies to Example 1.2.3 of the preceding section. 

In words, we can think of Theorem 1.3.3 in this way: Imagine a pair of equa- 
tions in the variables x; , x2, x3 that has the form 


Fi(x1,x2,x%3) = O, 
F2(x1,x2,%3) = O. 


We wish to solve for x2 and x3 in terms of the remaining variable x;. Ideally, x2 
and x3 should be expressed as smooth functions of x1. The condition that will 
guarantee this conclusion is that the “derivative” with respect to the variables for 
which we wish to solve should be invertible. Here the “derivative” is a linear map 
from R? to R’, so it is invertible if and only if the determinant is nonvanishing. 


The next example of the implicit function theorem will lead to a corollary form 
of the inverse function theorem. In comparison with Theorem 1.3.3, all we really 
change is the dimension of the domain of ¥. 
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Theorem 1.3.4 We let U C R4 be an open set and we assume that 
F =(Fi,Fr:):U > R 


is a continuously differentiable function. Further assume that, at a pointa = 
(a}, 2, a3,a4) € U, it holds that F(a) = 0 and 


OF; dF, 
0x3 0x4 
det 0. ; 
"| a% am |” ie 
0x3 0x4 


Then there is a product neighborhood V x W CU, with (a,,a2) € V € R? and 
(a3,a4) € W C R?, and a unique, continuously differentiable mapping 


F =(F), Foy): V > W 
such that (a3, a4) = F(a,, a2) and, for each x = (x1, x2) € V, it holds that 
Flx1, x2, Fi (x), F2(x)] = 0. 
Once more the result is a special case of those in Section 3.3. 
Corollary 1.3.5 We let Y C R? be an open set and we assume that 
G =(G1,G2): Y > R? 


is a continuously differentiable function. We further assume that, at a pointb = 
(b, b2) € Y, it holds that 


0G; dG; 
dy1 = 92 

det} 36, aGz #0. (1.16) 
dy1 = Oy 


Then there are neighborhoods V,W © R2, with a = (a1,a2) = G(b) € V and 
b € W and a unique, continuously differentiable mapping 


F=(F\,Fo):V—7W 
such that b = F(a) and, for each x = (x1, x2) € V, it holds that 
x = G[F(x)]. 
Proof. We define ¥ : R2 x Y > R?* by setting 
F (x1, X2 X3, %4) = (1, X2) — G(x3, 4)- 


Equation (1.16) implies that (1.15) holds at (a1, @2, b1, 62). Thus the corollary 
follows from Theorem 1.3.4. O 


In the next example, we show how the implicit function theorem, in the form 
of Corollary 1.3.5 can be applied to the study of a partial differential equation. 
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Example 1.3.6 Let W be an open set in R* and let x : W —> R be a twice 
continuously differentiable function. If at a point (xo, yo) € W we know that 


holds, where the subscripts indicate partial differentiation, then, in a neighborhood 
of (xo, yo) € W, one can make an invertible transformation from (x, y) to (€, 7) 
and define a function w(€, 7) so that the formulas 
w(E,n) +u(x,y)=x§ + yn, 
—E = uy, n = Wy, (1.18) 
x = w, y = Oy, 
hold. 


To see that such a transformation can be made, we apply Corollary 1.3.5 to the 
function from R? to R* given by 


(x,y) (uxx, y), Uy(x, )) . 


Equation (1.17) is exactly the hypothesis needed to apply Corollary 1.3.5 to con- 
clude that the transformation 

E = uyx(x,y) 
Uy(x, y) 


is invertible. 
Defining w by setting 


w(&,n) = —u(x, y) + x§ + yn, 


we compute 
WE = —UyX~E — UyyE + xeG +x + yen 
= —Exe — nye + xeE +x + yen =x, 
Wy =  —UyXy — UyYy + Xn§ + yy + y 
= —EXy — Yn + XnF +yynty=y, 
showing that all the formulas in (1.18) hold. O 


Remark 1.3.7 The transformation effected in the example is known as a Legen- 
dre transformation in honor of Adrien Marie Legendre (1752-1833) who intro- 
duced the idea in 1789. Such a transformation can sometimes be used to simplify 
the integration of a partial differential equation. Of course, Legendre transforma- 
tions can be performed when there are more than two variables (see Courant— 
Hilbert [CH 62]). There are also sophisticated uses of Legendre transformations 
in mechanics (see Arnol’d [Ar 78]). 


2 
History 


2.1 Historical Introduction 


The earliest works on algebra beginning with Al-jabr w’al muqdbala by Mo- 
hammed ben Musa Al-Khowarizmi (circa A.D. 825), from whence we get the 
word “algebra” (and the word “‘algorithm”), presented problems and solutions by 
numerical example. The notion of a “function,” whether explicit or implicit, would 
make no sense in such a context. It was not until about 1600 that the idea of using 
letters to denote both unknowns and coefficients was introduced by Francois Viéte 
(1540-1603). The algebraic methods of Viéte were taken up by René Descartes 
(1596-1650) and combined with Descartes’s own coordinate system inspiration. 
That fundamental advance in 1637 finally brought mathematics to the point that 
the notion of a function could make sense. From the beginning, many of the func- 
tions were defined implicitly, as in the general quadratic curve 


Ax? + Bxy + Cy*+ Dx + Ey + F =0 


which arose in Descartes’ solution of the Problem of Pappus (circa 300 A.D.), a 
problem that had been unsolved for more than a millennium.! 

It appears that, before 1800, no one felt that there was a need to prove the 
existence of any implicit function. In fact, we can get a sense of their outlook 
from the words of Euler (as translated by John D. Blandon; see Euler [EB 88; 


page 5]): 


1 For more detail on these matters, see Hairer and Wanner (HW 96]. 
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Indeed frequently algebraic functions cannot be expressed explicitly. 
For example, consider the function Z of z defined by the equation, 
WD = az2Z — bz Z? + cz3Z — 1. Even if this equation cannot 
be solved, still it remains true that Z is equal to some expression 
composed of the variable z and constants, and for this reason Z Shall 
be a function of z. 


The approach to implicit functions was to show how they behave, rather than 
to prove they exist. The work of Isaac Newton that we describe below may be 
one of the first instances of analyzing the behavior of an implicitly defined func- 
tion. In the context of calculus, Gottfried Leibniz (1646-1716) applied implicit 
differentiation as early as 1684 (see [St 69; pages 276-278)). 

In 1770, Joseph Lagrange proved what may be the first true implicit function 
theorem, but in its closely related form as an inverse function theorem. The result 
is now known as the Lagrange Inversion Theorem. Lagrange’s theorem is what we 
would consider a special case of the inverse function theorem for formal power 
series. 

Lagrange’s theorem is quite important for celestial mechanics. Celestial me- 
chanics occupied a central role in 18th and 19th century mathematics and La- 
grange’s theorem was very well known. Cauchy, in his quest to make mathemat- 
ics rigorous, naturally gave his attention to that theorem and its generalizations. 
So it is that William Fogg Osgood (1864-1943), one of the first great American 
analysts,” attributes the implicit function theorem to Cauchy; more specifically, 
Osgood cites the “Turin Memoir” of Cauchy as the source of the implicit function 
theorem. The story of Cauchy’s exile to Turin is a subject of some controversy and 
we will leave it to the reader to consult other sources, such as Belhoste [Be 91] 
and the references therein. In fact, there are two Turin Memoirs by Cauchy, and it 
is the first that contains the implicit function theorem. Also, we should note that 
the first Turin Memoir was, so to speak, printed, but not published; that is, while 
all parts of the first Turin Memoir ultimately appear in Cauchy’s collected works, 
the memoir as a unified whole does not; nonetheless, the portion of the first Turin 
Memoir containing the implicit function theorem can be found in Cauchy [Ca 16]. 

It was only later in the 19th century that the profound differences between 
complex analysis and real analysis came to be more fully appreciated. Thus the 
real-variable form of the implicit function theorem was not enunciated and proved 
until the work of Ulisse Dini (1845-1918) that was first presented at the Univer- 
sity of Pisa in the academic year 1876-1877 (see Dini [Di 07]). 

In the remainder of this chapter, we will describe the contributions of Newton, 
Lagrange, and Cauchy mentioned above. The real-variable approach, going back 
to Dini, is pervasive throughout the rest of this book. 


2 Admittedly, he did earn his Ph.D. in Europe (at Heidelberg under Max Noether (1844—1921)). 
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2.2 Newton 


The basic problem addressed by the implicit function theorem is of such funda- 
mental interest that the genesis of the theorem goes back to Newton. In the Latin 
manuscript De Analysi per Aiquationes Infinitas of 16693, Newton addresses the 
question of expressing the solution of the equation 


Y? +.a”Y —2a3 +.axY —x? =0 (2.1) 


as a Series in x that will be valid near x = O and that will give the roota # 0 
when x = 0. This computation can be found in the paragraph entitled Exempla 
per Resolutionem A:quationun Affectarum. The paragraph begins with what we 
now call “Newton's method” for finding roots, and the series solution is presented 
as an extension of that numerical method. We know of no earlier reference that 
could be considered to be a version of the implicit function theorem. 

Newton refined his procedure in the 1670 manuscript De Methodis Serierum et 
Fluxionum (see the paragraph De Affectarum A-quationiun Reductione), and the 
device constructed in this improvement is now known as the Newton polygon or 
Newton diagram. 

To introduce the Newton polygon, we begin with an example. 


Example 2.2.1 Consider the equation 
3 Zee 4_ 
y +x-y?- —xytx' =0 (2.2) 


near x = 0. The locus of points satisfying this equation is shown in Figure 2.1. 


y 


Figure 2.1. The Locus of Points Satisfying (2.2) 


3This manuscript can be found together with its translation in Newton [NW 68). 
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Assume there is a solution of (2.2) of the form y = y(x) that has its graph 
passing through the origin and that is defined at least for small values of x. In 
particular, we will have y(0) = 0. 

The idea behind the Newton polygon is to make the further assumption that we 
can write 


y(x) = x*y(x) (2.3) 


with )(x) a continuous function that does not vanish when x = 0. The number 

in (2.3) is a parameter which must be chosen appropriately. Newton's insight was 

that a value of a should be used if and only if its use allows (0) to be determined. 
Substituting y = x%y in (2.2), we obtain 


3953 4 x204252 _ pot 5 4 x4 = 0. (2.4) 


To be able to determine (0) from (2.4), there must be two or more monomials in 
(2.4) which have the same power of x and all other monomials must have a larger 
power of x. 

For instance, if we set w = 3, then (2.4) becomes 


Oy + x85 —x4F 4x4 =0. (2.5) 
Dividing (2.5) by x4, we obtain 
ry + x45? -F4+1=0. (2.6) 


Setting x = 0 in (2.6), we find y(0) = 1. This tells us that the locus of points 
satisfying (2.2) contains a curve approximated near x = 0 by 


y=x?. 


This curve is illustrated in Figure 2.2. O 


The choice a = 3 made in the preceding example is not unique; this choice is 
merely the one which causes the last two monomials in (2.2) to contain the same 
power of x after the substitution y = xy. In fact, for each pair of monomials 
in (2.2) there is an exponent @ which will cause those two monomials to contain 
the same power of x after the substitution y = x“ y. One convenient way to keep 
track of all these possibly useful values of @ is as follows: For each nontrivial 
monomial in the equation, consider the point in the plane whose coordinates are 
the exponents on x and on y. For the equation (2.2), we obtain the points (0, 3), 
(2, 2), (1, 1), and (0, 4). Each line segment between a pair of these points can be 
identified with a choice of a that causes the corresponding pair of monomials to 
contain the same power of x after the substitution y = x“. In fact, the slope m 
of the line segment is related to a by the equation 


a=-l/m, 


as the reader should verify. 
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Figure 2.2. The Part of the Locus Approximated by y = x3 


Figure 2.3 shows all the line segments corresponding to pairs of monomials in 
(2.2). The associated values of aw are 3, 1/2, 4/3, 2, 1, and —1. Only the first two 
choices corresponding to the substitutions y = x3 and y = x!/*5, respectively, 
lead to curves that approximate part of the locus. Below we describe the geometric 
method used to decide which of the possible substitutions should be used. 

The set of segments in Figure 2.3 encloses a convex region in the plane, namely, 
the convex hull of the set of points 


{(0, 3), (2,2), (1, 1), (0, 4)}. 


Because there is no common power of x or y in (2.2), the convex region touches 
both the vertical and horizontal axes. The Newton polygon associated with (2.2) 
is the part of the boundary of the convex region that goes along the bottom left 
boundary of the region from the vertical axis to the horizontal axis (see Fig- 
ure 2.4). Only values of a corresponding to segments in the Newton polygon 


Figure 2.3. Segments Corresponding to Pairs of Monomials 
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Figure 2.4. The Newton Polygon for (2.2) 


allow a nonzero value of »(0) to be determined. For example, a = —1 corre- 
sponding to the segment from (1, 1) to (2, 2) is not part of the Newton polygon 
and the equation resulting from the substitution y = x! is 


which cannot be satisfied by any function y(x) that is continuous at x = 0. 


General Construction of the Newton Polygon. The Newton polygon is used to 
determine the behavior of the locus of points satisfying a polynomial equation 


N 


P(x,yy= >> D> aijx'y/ (2.7) 


n=0 i+j=n 


in a neighborhood of a point of the locus. By changing variables using a trans- 
lation, we may assume that a point of the locus is (0,0). We may also assume 
that there is no common factor of x or y in the polynomial. Purists might wish to 
assume the irreducibility of P, but this is not necessary for the analysis that will 
follow. 

Equation (2.1) was the example that Newton used, so we will use it here to 
illustrate the process. If we make the change of variable y = Y — a in (2.1), we 
obtain the equation 


y? + 3ay* + 4a*y +axy+a*x —x3 =0. (2.8) 
In the notation of (2.7), we have 
a0 =a’, ao = 4a", ai. =a, 40,2 = 3a, 43,0 = —1, a03 = 1, 


and all other coefficients are equal to 0. 
The set of all line segments connecting pairs of points in 


{(i, jf) : aij #0) (2.9) 
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Figure 2.5. Constructing the Newton Polygon 


encloses a convex set K. In fact, K is the convex hull of the set given in (2.9). The 
boundary of K, denoted 0K, is a closed polygonal path in the first quadrant that 
intersects both axes. Of the two subpaths in 8K with an endpoint in each axis, 
the Newton polygon is the one nearer the origin. This construction is illustrated in 
Figure 2.5 for the equation (2.8). 

To appreciate the significance of the Newton polygon, let us rewrite the poly- 
nomial P in the form 


M 

P(x, y)= >> Ajeex y/, (2.10) 

j=0 

where either Aj; = 0 or Aj(0) # O (if we were to have A;(0) = 0, then a power 
of x would divide A ;(x) and that power of x should have been factored out and 
included in x"/). The assumption that there is no common factor of x or y implies 
that Ag is not the zero polynomial and that some 4; = 0. We have lig # 0, since 
PO, 0) = 0. 


Remark 2.2.2 Notice that if two or more of the / j's in (2.10) were zero, then 
PO, y) would not be identically zero and thus would have at least one nonzero 
root r. Consequently, for small values of x there would be a root y(x) of P(x, y) 
near to 1, that is, we can approximate one branch of the locus P(x, y) = 0 by the 
line y = r. Of course, we are interested in branches of the locus that pass through 
(0, 0), rather than branches through (0, r), but we will see that each segment of 
the Newton polygon allows us to reduce one branch through (0, 0) to this simpler 
Situation. 


Any vertex of the Newton polygon must be of the form (/;, j), so any line 
segment contained in the Newton polygon must contain two or more such points. 
We list those points as 


(hj,. Ji), (hj, Deine, (jas Jo). (2.11) 
Letting —1/a be the slope of the line segment, we note that if we substitute 


y(x) = x v(x), (2.12) 
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then we have x'/e yje = xin +odp yJe and 
hj, taj) =hjy taj =-+- = hj, + ajo (2.13) 


holds. Let 8 denote the common value in (2.13). For any j such that Aj; is not 
identically zero and such that the point (hj, j) is not listed in (2.11), we see that 
hj +aj > B, this because of convexity of the set K used to define the Newton 


polygon. 
Thus, by making the substitution (2.12), we obtain 


M ‘i — 
P(x, y) = D> Ay(x)xtsted GF = xP) Ay (x)x sted 53 
j=0 j=0 


Since the polynomial in y, 


M 
P(x, 5) = D— Ag(x)x utd -P)5J 
j=0 


has two or more of the powers ht ; +a equal to zero, we find ourselves in the sim- 
pler situation discussed above in Remark 2.2.2—except that the terms that vanish 
when x = O now may involve positive fractional powers of x rather than only 
positive integral powers. Letting r be a nonzero root of P(0, y) = 0, we conclude 
that a branch of the locus of P(x, y) = O near (0, r) can be approximated by the 
line y = r and, thus, a branch of the locus of P(x, y) = O near (0, 0) can be 
approximated by the curve y = x“r. 

For the equation (2.8), there is only one segment in the Newton polygon and it 
has a slope of —1. Thus we substitute 


y=xy, 
and, after eliminating a common factor of x, we find that 
x? + 3axy? + axf¥ —x* 4+ 4a2h +a? =0. (2.14) 


The solution of (2.14) near x = O satisfies y ~ —t, so we conclude that y ~ —tx 
and finally that 
YRa- ix. 


Of course, in this case, the last approximation is the linear approximation readily 
obtained using calculus and the implicit function theorem. 
2.3 Lagrange 


Much of Lagrange’s fame as a mathematician was owing to his successes in the 
study of celestial mechanics. He received numerous prizes for this work, begin- 
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Figure 2.6. Orbital Parameters 


ning with the 1764 award given by the Paris Academy of Sciences for his paper 
on the libration of the moon.4 
A basic result in celestial mechanics is Kepler’s equation 


E=M-+esin(E), (2.15) 


where M is the mean anomaly,> E is the eccentric anomaly, and e is the eccen- 
tricity of the orbit. We will describe these quantities in more detail later. For the 
moment, we note that M and e should be considered to be the quantities that can 
be measured and that e is assumed to be small. One of Lagrange’s theorems, now 
called the Lagrange Inversion Theorem, gave a formula for the correction that 
must be made when, for some function W(-), (MM) is replaced by W(E). The 
correction takes the form of a power series in e. Thus, one can adjust for the dif- 
ference between the mean anomaly and the eccentric anomaly. Since Lagrange 
was not sensitive to questions of convergence in the way we are today, his proof 
amounts to what we would call a “formal power series” argument. 


Kepler’s Equation. Kepler’s (1571-1630) equation is 
E=M-+esin(E), 


where M is the mean anomaly, E is the eccentric anomaly, and e is the eccentricity 
of the orbit. Figure 2.6 illustrates the true anomaly, w, and the eccentric anomaly, 
E, of a body, B, moving in an elliptical orbit about a much more massive body 
at the focus, F, of the ellipse. The position of the body at a particular time is 
indicated by the point B. The pericenter of the orbit, P, is defined to be the point 
of nearest approach of the orbiting body to the focus F. The true anomaly is the 


4The libration of the moon is an irregularity of its motion that allows approximately 59% of the 
moon's surface to be visible from the earth. 

5In astronomy, the word “anomaly” refers to the angle between the direction to an orbiting body 
and the direction to its last perihelion. 
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angle formed by B, F, and P, that is, 
w= ZBFP (2.16) 


The true anomaly is signed so as to be increasing with time. The circle centered 
at the center of the orbit, O, and tangent to the orbit at the pericenter is called 
the auxillary circle. The eccentric anomaly is the angle formed by P, O, and the 
point B’ on the auxillary circle that projects orthogonally onto the major axis of 
the ellipse to the same point as does the orbiting body, that is, 


E = ZB'OP (2.17) 


The eccentric anomaly is also signed so as to be increasing with time. 

The eccentricity, e. of the orbit is the ratio of the length O F to the length OP. 
In Figure 2.6 the eccentricity is 0.6. The eccentricity of the earth’s orbit about the 
sun is approximately 0.016, so, were the figure to be a representation of the earth 
and the sun, Figure 2.6 would be quite exaggerated. 

The mean anomaly does not have a geometric description that can be illustrated 
readily in Figure 2.6. Rather, the mean anomaly is the angle 


M=ZBOP, (2.18) 


where B is the location of a hypothetical body traveling around the auxillary circle 
with the same period of rotation as the orbiting body, but which is moving with 
constant speed. This hypothetical body is assumed to start from the pericenter 
at the same time (and in the same direction) as the actual orbiting body. The 
hypothetical and actual bodies will again be coincident at the far end of the major 
axis, and will coincide twice in each complete orbit. The mean anomaly is much 
more easily determined than the eccentric anomaly, but the eccentric anomaly is 
more relevant geometrically and physically. 


Lagrange’s Theorem. To state and prove Lagrange’'s theorem, we will need to 
use the language of and some results from complex analysis. The reader without 
the requisite background may simply take note of Lagrange’s formula (2.21). 


Theorem 2.3.1 (Lagrange Inversion Theorem [La 69]). Let w(z) and (z) be 
analytic on the open disc D(a, r) C C and continuous on the closed disc D(a, r). 
[ft is of small enough modulus that 


I¢@(z)| < |z —al (2.19) 
holds for = € 8D(a,r), then 
~=a+1t¢(l) (2.20) 


has exactly one root in D(a,r) and, if that root ¢ = C(t) is considered as a 
function of t, then we have 


n=] dz! ee 


OO an d'-! 
W(t)=vlay+ > — (wow) 


=a 
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We will give two proofs of Lagrange’s theorem. The first proof uses the Cauchy 
theory from complex analysis. The second is a proof that is due to Laplace (1749- 
1827), and depends heavily on the chain rule of calculus. 

We will need some classical results from complex analysis. The first of these 
classical results is the Cauchy integral formula (see Greene and Krantz [GK 97; 
page 48)). 


Theorem 2.3.2 (Cauchy Integral Formula). Suppose that U is an open set in C 
and that f is a holomorphic function on U. Let zo € U and let r > 0 be such that 
D(z9, r) © U. Then, for each z € D(zo,r), it holds that 
l ( 
f~==— SG) dt. 


2ni aD(u.r) $§ — 2 


The second classical result we will need is Rouché’s theorem. (See Greene and 
Krantz [GK 97; page 168ff.]). 


Lemma 2.3.3 (Rouché’s theorem). Suppose that f,g : U -—> C are analytic 
functions on an open set U CC. If D(a,r) CU and if, for each z € 8D(a,r), 


| f(z) — g(z)l < | f(z] + lez} (2.22) 


holds, then the number of zeros of f in D(a, r) equals the number of zeros of g in 
D(a,r), counting multiplicities. 


Proof of Lagrange’s Inversion Theorem 2.3.1 using the Cauchy Integral For- 
mula. We will make the simplifying assumption that 


$(z) #Oin D(a,r). (2.23) 


By Lemma 2.3.3, applied with f(z) = z — a and with g(z) = z —a —t¢(z), 
we see that (2.20) has exactly one root ¢ in D(a,r). 
Fix t and ¢ = (fr) satisfying (2.20). We set 
z—a 
6(z) = —— 2.24 
2 $(z) a 
We have 6(¢) =t. 
Note that 6(z) is analytic on D(a, r) and that 6(z) — @(¢) has its only zero at 
z = ¢. We can write 


6(z) = 8(f) + (z — fF) R(z), 


where R(z) is nonvanishing in D(a, r). Using the Cauchy integral formula and 
the fact that R(t) = @’(¢), we compute 


a Yee) yp = Lf wore) 
271 aDa,r) 6(z) — 8(f) 2ri aD(a,r) (z — £)R(z) 
W(E)O(S) 
R®) vs). 


24 2. History 


The condition (2.19) is equivalent to |6(¢)| < |6(z)], so we have 


— O@) _ Se gli "(z)[6(t)]" 


(2.25) 
6(z)—O) <% [a(zy*! 


Thus, we have 


v@) 


oh. § w(z)6'(z) d 
2ni Japwry 9(z) — 8%) 


> 1 f VW (z)O'(z)[6(F)}" d 
= — ——__—-— dz 
2ri Jad(ar) [a(z)}"*! 

vf (z)0’(z) 


= n ieee | 
? ii 8D(a,r) [6(z)J"+! . 


= ye ig v (z)6'(z) VOC) 
~ 46° Ini Javan) (OF! 
Integration by parts gives us 
f ¥(z)6'(2) 4, _ f w'(z) 
apta.r) (A(z) "+! ~ apa.) (6(z)]" 


So we have 


v'(2) 
yer= yor a aD(u.r) (oc) “ 


Using equation (2.24), we have 


= " l 7 ¥'@) 
¥(%) X! 2ntt Jad(a,r) fom" * 


fe,°) U a 
~ Sr ) VEN 2, 
ao r) 


7 _ “ 
a0 2nti (z —a) 


— . 1 d'- 
Set ~(y (a)[$(a)}") . O 
0 


a= 


We now present a second proof of Theorem 2.3.1 that is longer and less 
self-contained than the first. It utilizes some interesting new ideas, including the 
Schwarz—-Pick lemma from complex variable theory, which we state next (see 
Greene and Krantz [GK 97; page 174)). 


Lemma 2.3.4 (Schwarz—Pick Lemma). Let h be analytic on the open unit disc in 


C.if 


lh(z)] < Lforall|z| <1, 
h(c) d, 
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then 


1 —|d/? 
1 — |c|2 


|h'(c)| < (2.26) 


In case equality holds in (2.26), then h is of the form 


h(z) =a z —, 
l-—az 

for some complex numbers a and w with |a| < 1, |w| = 1, and if, additionally, 
c=d=0, thena =0. 


In the proof given below for Theorem 2.3.1, we will also need to apply the max- 
imum modulus theorem from complex analysis (see Greene and Krantz [GK 97; 
page 172)). 


Theorem 2.3.5 (Maximum Modulus Theorem). Let U C C be a bounded, open, 
connected set. Let f be a continuous function on U that is holomorphic on U. 
Then the maximum value of | f | on U must occur on aU. 


Proof that ¢ is analytic. We again begin by applying Rouché’s Theorem 2.3.3 
with f(z) = < —a and g(z) = t¢(z) to see that (2.20) has exactly one root ¢ in 
D(a,r). 

Now fix ¢ and ¢, the corresponding root of (2.20). We will apply Lemma 2.3.4 
to 


hoj= blasts ai). z € D(O, 1). (2.27) 
r r—2z(g —a) 
The function 
‘Z+0-—a 
Zreeatr = 
r—z(0 —a) 


maps D(0, 1) to D(a,r), sending 0 to ¢. So |h(z)| < |z —al/r = 1 persists 
for z € 9D(0, 1), and the inequality holds on the interior of the unit disc by the 
maximum modulus theorem. Setting c = 0, d = (f — a)/r in Lemma 2.3.4, we 
conclude that 
2 2 
\h’(0)| < a 
r 


<1. (2.28) 

Now |/’(0)| = 1 implies both that ¢ = a and that the case of equality has 
occurred in Lemma 2.3.4. So by the uniqueness part of Lemma 2.3.4, we can 
conclude that A(z) = wz, for some complex constant w of modulus 1. It follows 
then that (z) = 2(z —a), contradicting (2.19). Thus, we must have |/’(0)| < 1. 

The inequality |h’(O)| < 1 implies that 1 — t¢’(¢) # 0, which is exactly the 
condition we need to apply the complex analytic form of the implicit function 
theorem (to be presented in the next section) to conclude that ¢ is an analytic 
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function of r. Indeed, for future purposes, we note that 1 — t'(¢) 4 O shows that 
¢ is an analytic function of both ¢ and a. 


It remains to show that Lagrange’s expansion (2.21) is valid. 


Laplace’s Proof of Lagrange’s Expansion (2.21). Computing the partial deriva- 
tives of (2.20) with respect to a and r, we obtain 


rs] 
1 = D-¢'@le (2.29) 


rs) 
og) = I-wOls (2.30) 


Writing 
u(a,t) = WIg(a,t)], 
we find that 


rs] 
> = = oo (2.31) 


We will also need the general as 


< =|F oe|=5 [Feose| (2.32) 


that holds for any differentiable F. To see that (2.32) holds, we compute, on the 
one hand, 


d du) at du a*u 

5 [Fens | = ae ae 
= Pa =H es = + F oe 
= . Saat 


and, on the other hand, 


re) au} Of Ou 87 u 
= | Peze | = ae FO) 


u 


a 
a Fos Eye = + Fa 


To verify the Lagrange expansion, we need to show that the coefficients in the 
Taylor series for u(a, t), considered as a function of f, are as given in (2.21); that 
is, we need to show that 


oe ~ 
a = agi soar | O00) =| (2.33) 


holds for = 1,2,.... This will be proved by induction on 1. Note that (2.33) 
holds when n = 1 by (2.31). 
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To see that the inductive step is true, suppose that (2.33) holds for the positive 
integer n. We compute 


antl, a { a" , ou 
at = (5 lors |) 


gr-! F) : du 
= dan! (= loc) =|) 


an“! (8 1 Ou by (2.32 
= soi (5 Ee =|) y (2.52) 
- rs) 
= = [oer =| 
= a” n+l du 
~ Ban Ee | by (2.31), 
which verifies that (2.33) holds for n + 1. O 


2.4 Cauchy 


As mentioned in Section 2.1, Cauchy (1789-1857) is credited with the first rig- 
orous form of the implicit function theorem. The next theorem, in the context of 
holomorphic functions, proves the existence of an implicitly defined function un- 
der the now standard hypotheses and also gives an integral representation for the 
function; the argument used in the proof is due to Cauchy.® As in the statement 
and proof of Lagrange’s theorem, techniques from complex analysis will be used 
in this section. The reader without background in that area may wish to skip this 
section. The final result in this section applies to formal power series. 


Theorem 2.4.1 Suppose that F(x, y) is holomorphic in the bidisc D(xo, R) x 
D(yo, R2) € C2 and write 


OF 
DF = —. (2.34) 
dy 


If 
F(xo, yo) = 0 and D2F (xo, yo) # 9, (2.35) 


then there is a disc D(xo, ro) and a unique holomorphic function f (x) defined on 
D(x0, ro) with f (xo) = yo and such that 


F(x, f(x)) =0 (2.36) 


6 Cauchy (Ca 16; page 74ff.] 
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holds for x € D(xo, 10). Moreover, that function f (x) is represented by 


I D2F (x, y) 
Be —— Ay, 2.37 
a 2rt Jc “ F(x, y) a en 


where C = 8D(yo, 1) is a suitably chosen circle. 


Proof. By the hypotheses (2.35) we see that, as a function of y, F(xo, y) has a 
simple zero at y = yo. It follows that there exists 0 <r, < R2 such that 


F (xo. vy) ¥ 0 holds for 0 < |y — yol <r. (2.38) 
In particular, we have 
0 < inf(|F(xo, y)I: ly — yol = "1}- (2.39) 


Since F(x, y) approaches F (xo, y), uniformly in y, as x approaches xo, and be- 
cause of (2.39), we can select a number 0 < ro < R; such that 


|F(x, y) — F(x, y)| < |F (xo, y)| holds for |x — xo] < ro, lv — yol =ri- 
(2.40) 


Now, by Rouché’s theorem (i.e., Lemma 2.3.3) and (2.40), for each fixed x with 
|x — xo] < ro, the functions F(x, y) and F (xo, y) have the same number of zeros 
in the disc D(xo, r1), and since F(x, y) has exactly one zero, it follows that 
F(x, y) also has exactly one zero, which we may denote by f(x). 

It is evident that, for fixed x € D(xo, ro), the residue of 


_ D2F (x, y) 
F(x, y) 


as a function of y at the point y = f(x) is just f(x), so the representation (2.37) 
holds. The fact that f(x) is a holomorphic function of x then follows by differen- 
tiating (2.37), with respect to x, under the integral sign. O 


Remark 2.4.2 The proof given above can also be adapted to the situation in 
which F (xg, y) has a zero of multiplicity n: > 1 at yo. In this case, for each 
fixed x € D(xo, 11), it is the sum of the zeros of F(x, -) in D(yo, r1) that is given 
by the right-hand side of (2.37); of course the zeros must be counted according to 
their multiplicities. In fact, Cauchy dealt extensively with this form of the result. 


Cauchy also gave a proof of the implicit function theorem by means of majo- 
rants.’ The proof by the method of majorants is equally applicable to real analytic 
functions and holomorphic functions, since only the convergence of power series 
is at issue. A complete treatment of the real analytic implicit function theorem, 


7 What is now known as the “method of majorants” was called the calcul des limites by Cauchy. 
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together with its connections to the complex holomorphic implicit function theo- 
rem, appears in [KP 92). 

The method of majorants is also the key tool in the proof of the Cauchy— 
Kowalewsky theorem (Sonja Kowalewsky: 1853-1891) on the existence of so- 
lutions of certain partial differential equations (see Courant and Hilbert [CH 62; 
Chapter 1, Section 7) or Krantz and Parks [KP 92; Sections 1.7 and 1.10)). 

We will need a result from several complex variables which allows us to bound 
the coefficients in a convergent power series (see Krantz [Kr 92; Section 2.3]); 
this result is a consequence of the Cauchy estimates in several variables. 


Lemma 2.4.3 If 
© ° . ° 
Fees eee Xn) a > Vii sen 
FivJ2s--.jn=0 
is absolutely convergent for |x,| < R,, |x2| < R2,.-.-, lxn] < Rn and if 


M =sup{| f(x)| :x € DO, Ri) x D(O, R2) x --- x DO, Rn)), 


then 
M 


Viki. = 
Hiri RERP RE 
holds for j\, j2,--+» jn € {0, 1,...}. 
Theorem 2.4.4 Suppose the power series 
eo ° 
F(x,y)= )> ajexsy* (2.41) 
j.k=0 
is absolutely convergent for |x| < Ri, |y| < Ro. If 
ago = O and ag, # 0, (2.42) 


then there exist ro > O and a power Series 
eo . 
f= yoe jx (2.43) 
j=! 


such that (2.43) is absolutely convergent for |x| < ro and 
F(x, f(x)) = 0. (2.44) 


Proof. It will be no loss of generality to assume ag; = 1, so that (2.41) takes the 
form 


oo CO Ww . 
F(x,y)=yt (ajo +ajiy)x? + > > ajax! y* (2.45) 
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Introducing the notation bj, = —ajx, we can rewrite the equation F(x, y) = Oas 
y= Yer + bjy)x/ + y b jx! y* (2.46) 
j=0 k=2 


or y = B(x, y), where 


B(x, y) = er aT . (2.47) 
j=! j=0 k=2 


Substituting y = f(x) into (2.46) with f(x) given by (2.43) we obtain 


k 
So! = So HES sper 4 9" be! (Secs 


j=lk= j=0 k=2 
(2.48) 


If all the series in (2.48) are ultimately shown to be absolutely convergent, then 
the order of summation can be freely rearranged. Assuming absolute convergence, 
we can equate like powers of x on the left-hand and right-hand sides of (2.48) and 
obtain the following sequence of recurrence relations that must hold: 


c1 = dio, 

C2 = bo +bi10c1 + bo2(c1)" 

C3. = b39 + bac) + by2¢2 + b12(c1)? + bo3(c1)? + byez + 2bo2c1¢2, 
Cy = byotby-iici +:+-+biyscy 


ki 
2 iboei (co)... (ep)* , (2.49) 


where the last summation extends over j € {0,1,...}, k € {2,3,...}, p € 
{1,2,...}, and ky,ko,...,kp € {0, 1, ...} such that 


+k) +2ko+---+ pkyp = J. 


While the recurrence relations (2.49) uniquely determine the coefficients cj in 
the power series for the implicit function, it is also necessary to show that (2.43) 
is convergent. The easiest way to obtain the needed estimates is by using the 
method of majorants, which we describe next. Consider two power series in the 
same number of variables: 


(oe) 
D(X) .42,..55%)) = > Pj, ja... aot x xt, (2.50) 
Si. J20---.jp=0 
oo . 
W(x1,%2,-..5X)) = Wie a 7051) 


jy -J2. - ip=0 
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We say that V(x, x2,..., Xp) is a majorant of O(x1, x2, ..., Xp) if 
IP jy, jordin | S Vir josndp (2.52) 


Because all the coefficients k!/(ky!k2! ... kp!) in (2.49) are non-negative, if 


holds for all j), j2..-., jp- 


[oe] 
Gx. y)= D> ayexiy*, 
* j,k=0 


with go0 = go) = 0, is a majorant of 


B(x, y) = > (bjo+ bjy)x/ + > bjux!y! 


j=l j=0 k=2 
and if 
oO . 
h(x) = > hjx! (2.53) 
j=l 
solves 
h(x) = G[x, h(x)], (2.54) 


then (x) will be a majorant of f(x). Consequently, if the series (2.53) for h(x) 
is convergent, then the series (2.43) is convergent and its radius of convergence is 
at least as large as the radius of convergence for (2.53). 


We take 
G(x,y) = M [a eR) RTP y/Ro| 
00 j vk 
y xiy 
any | (es eee oe —— 
mt Du, airy | 
oo xty 
= M 1 
ba . tat ind 
where 


M =sup{|B(x, y)| :x € D(O, Ri), y € D(O, R2)}. 


We see that G is a majorant of B by Lemma 2.4.3. For this choice of majorant, 
(2.54) can be solved explicitly and the solution is clearly holomorphic at x = 0. 
If fact, y = h(x) is easily seen to be the solution of the quadratic equation 


—! 
M+R2. 1 ( *) 
Fa ta peoe | =O, 2.55) 
M(R2)2>  M> Ri ( 


O 
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Remark 2.4.5 


1. By using a smaller majorant a better estimate on the radius of convergence 
of f can be obtained (see Hille [Hi 59; pages 272—273]). 


2. The recurrence relations (2.49) constitute the implicit function theorem for 
formal power series. For completeness we state the formal power series 
theorem below. 


3. In Section 6.1 we treat the N-dimensional version of the ideas formulated 
in Theorem 2.4.4. 


Theorem 2.4.6 Suppose the formal power series 


F(x, y)= 3 ajzx! y* (2.56) 
j.k=0 
Satisfies 
aoo = O and ag, # 0. (2.57) 
Then there exists a unique formal power series f(x) = y ejx4 satisfying 


F(x, f(x)) = 0 and that series has its coefficients given by the following 
recurrence relations: 


co = 0, 

a10 
Cc} = ° =; 

aol 

a20 ay) ag2 
C= gy — 2 (q4)°, 

aol aol aol 

430 a2) a2 a}2 03 11 ag2 
c= ey — 02 — (01)? — = (e1 2 — 09 — 2 Serer, 

ag} aol a0} a0 aol aol 

ajo a J—1)1 ay 
Cy =— So Se a — eee = =——Cy 

agl ago} aol 

k! Ait 
JK ky ko k 
= oe Co)" .25 1p)? 5 2.58 
Do Fler. Blas 1D Mea) keg) ee) 


where the summation in (2.58) extends over 
je {0,1,...}, ke {2,3,...}, pe aac }s 


and 
ki, ka,...,kp é€ {0,1,...} 
such that 
J+k + 2ko+---+ pkp = J. 
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Prior to this text, the history of the implicit function theorem was not generally 
well known. Most of the material that we present comes from primary sources. 
The resulting tapestry illustrates a striking synergy among the different contribu- 
tors to the idea of what we now think of as a basic theorem of the calculus. 

We hope that the genesis presented here will give the reader some context and 
motivation for what follows. 


3 


Basic Ideas 


3.1 Introduction 


In order to make this book a convenient reference, we shall endeavor to make it 
locally self-contained. With this thought in mind, we shall begin by presenting a 
very classical treatment of the implicit function theorem in Euclidean space. 

There are two basic points of view in this classical setting. The first is to prove 
the implicit function theorem as an exercise in calculus (the Taylor expansion, the 
mean-value theorem, and estimates on derivatives) and the second is to use the 
Contraction Mapping Fixed Point Principle from elementary functional analysis 
to get a quick, soft, and easy proof of the implicit function theorem. 

We will give two proofs of the elementary calculus type. The first illustrates the 
original proof of the real-variable implicit function theorem. The result is obtained 
for just one dependent variable, but an arbitrary number of independent variables; 
then the general result is obtained by induction on the number of dependent vari- 
ables. Our second proof based on elementary calculus looks directly at the linear 
approximation of the mapping provided by calculus. This second proof reveals 
the inner workings of the theorem. 

The functional analysis approach to the implicit function theorem will provide 
us with our third proof of the result. This method of proof has the disadvantage of 
being more abstract. On the other hand, that very abstraction allows the argument 
to be applied in other settings, thus automatically yielding variants of the implicit 
function theorem in categories other than C*. 
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3.2 The Inductive Proof of the Implicit 
Function Theorem 


In this section, we present a proof of the implicit function theorem that is es- 
sentially the one given by Dini in the 1870s. The proof proceeds by induction 
on the number of dependent variables. The basis of the induction—the implicit 
function theorem for one dependent variable, one equation, and any number of 
independent variables—is relatively easy. The induction step is accomplished by 
distinguishing one equation and one dependent variable to which the base case 
can be applied, but with all the other dependent variables treated as if they were 
independent. The resulting implicitly defined function then is substituted in the 
other equations, thereby reducing the number of dependent variables and equa- 
tions by one. 

First, we state and prove the implicit function theorem for one dependent vari- 
able and one equation, but any number of independent variables. The proof relies 
on the intermediate value theorem and the use of a nonvanishing derivative to in- 
sure monotonicity. The hypotheses can be weakened a bit as in Young [Yo 09a] 
while still maintaining the same general method of proof. 


Theorem 3.2.1 If W © R” is open, F : W — R is continuously differentiable, 
and p = (p’,q) € R™~! x R, where p € W is a point for which 


OF 


F(p)=0 and 
OXm 


(p) #0, (3.1) 


then there exists an open set W' C R"—! with p' € W' and a unique continuously 
differentiable function  : W' —> R such that q = W(p’) and 


Fix’, ¥(x’)] =0 (3.2) 
holds for x’ € W'. 


Proof. Without loss of generality, we may assume that 


OF 
OXm 


(p) > 0. (3.3) 


By the continuity of 8F/8x,,, and by passing to a smaller neighborhood W of p 
if necessary, but without changing notation, we may assume that 


aoe 
3, *) > (3.4) 


holds for all x € W. 


Since (by (3.4)) F(p’.-) is an increasing function in an interval about q, we 
can find g1 <q < q2 so that 


F(p',q1) <0 < F(p’,q2) (3.5) 
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holds. Using the continuity of F, we can find a neighborhood W’ of p’ so that 
W’ x [41,92] © W and 


F(x',q1) < 0 < F(x’, q2) (3.6) 


holds for x’ € W’. It follows from the intermediate value theorem that for each 
x’ € W’ there is a number y with gq} < y < g2 so that F(x’, y) = 0 and by (3.4) 
that number y is unique. We let that y be y(x’) and note that the uniqueness of 
this value y also implies that y is continuous. 
To complete the proof, we need to show that 
oy OF / OF 


3.7 
ax, Oxjf OXm (6-7) 


holds, for j = 1,2....,m — 1. We do this with a very direct argument. Specifi- 
cally, fixing the point x’ € W’ and setting y = w(x’), we have 


OF 
F(x'+sej,y +t) — F(x’, a Hen, »y) te vs? +t? 
m 
(3.8) 


where € = €(s,f) approaches 0 as Vs~+1* approaches 0. Now, taking tf = 
W(x’ + sej) — w(x’) in (3.8), we find that 


OF, 35 
t = —s —(x',y) -—evs- +0". (3.9) 
OXm Ox; 
So we have 
OF oF, 
It] < |s| |——(x’. y)| + lel Is] + fel Ie]- (3.10) 
OXm Oxj 


For small enough choice of |s], we have je] < 4 |0F /Oxm(x'. y)| and je] < 
2 |8F /dx;(x', y)|, so that 


OF OF 
—(x’ — (x’ 3.11 
It] < 6|s| so. )| / [FRc] (3.11) 
holds. Thus we see that 
! a | 
W(x! +sej)— VO’) | + [He 9] [Ew ») | 
AY Non 
-1 

< |e| (1+6 axed ») eae y) (3.12) 


Taking the limit as s —> 0 in (3.12), we see that 8y/4xj; exists at x’ and is given 
by the formula in (3.7). The continuous differentiability of y follows. O 


Our proof of the general implicit function theorem will be simplified notation- 
ally if we use the following lemma from linear algebra. 
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Lemma 3.2.2 Let A be ann xn real matrix. Then there exists an invertible real 
matrix U such that UA is upper triangular. 


Proof. The matrix A can be reduced to row echelon form by a sequence of 
elementary row operations. A square matrix in row echelon form is necessarily 
upper triangular, and each elementary row operation can be accomplished via left 
multiplication by an invertible matrix. The result follows. O 


Now we set up the notation that we will use in the general theorem. 
Notation 3.2.3 We suppose that we are given a set of equations 
Si (Xr, X25 +6 X35 Mtv Y2s ++ Yn) = 0,7 P= 1,2) 00650; (3.13) 


where the functions f}, f2,..., fn are continuously differentiable. We will as- 
sume that (p;q) = (pi, P2.---s Pes Gis 42» +++» Qn) iS a point such that all the 
equations (3.13) hold and at which we have 


ah ah ahi 
dy1 dy2 OY 

det] dy; dy2 "ayn | 40. (3.14) 
dy, 8y2 OY 


We can think of the functions f;(p; -) as giving a mapping F : R" —> R® 
defined by 


yr Fly) = (fi (Psy), f2(pi y),---s fulp; y)). (3.15) 
Lemma 3.2.2, applied with A = D F(q), provides us with a linear transformation 


(namely left multiplication by the invertible matrix U from Lemma 3.2.2) that can 
be composed with this function F to give us a new function 


yh Fly) = (fice: y) rACe Y)seees ful y)) 


such that ee, 
9 fi Sart 
3 —(P:q) =0 whenever i > j. 
Oyj 
It is more convenient to simply assume, without changing notation, that we have 
oft. Bete 
5... (P}9)=0 wheneveri > j. (3.16) 


Oyj 
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After this preliminary modification, we have 


afi afi A \ (i Ai afi 

dy) dy2 dyn oy) dy2 aa dyn 

dy, Oy2 yn J = dy2 «OY (3.17) 

fn = Ofrn ofn ofn 

— ll oe rr 0 0 ne — 

dy, dy2 dyn dyn 

and, consequently, 
afi afi ahi 
dy, dy2 Ayn 
af af afr n of; 
det] 9y, dy2 dy, | =| [— +0 (3.18) 

: iat O¥i 
dy; dy? dyn 


at the point (p; q). 


Theorem 3.2.4 There exists a neighborhood U Cc R® of p and a set of con- 
tinuously differentiable functions $j : U > R, j = 1,2,...,n, such that 
$j(p) = 4j, j= 1,2,...,n, and 


Filx: O1(x), G2(x),----Gn(x)J =90, = 1,2,...,0, (3.19) 
hold for x € U. 


Proof. We argue by induction on n. The case n = 1 ts of course Theorem 3.2.1. 

Suppose now that n > 1 and that the theorem ts true with n replaced by n — 1. 
We assume that we have done the preliminary simplification as in Notation 3.2.3. 
By (3.18), we have 


—(p;q) #0. (3.20) 


Let us introduce the notation y’ = (yj, y2,---»Yn—-1); then Theorem 3.2.1 is 
applicable to the equation 


fry yn) =0 (3.21) 


at the point (p; gq’; gn), where we are treating the variables x), x2,...,%¢ and 
Yi» ¥2> +++» Yn—] aS independent and only the variable y, as dependent. Thus, by 


40 3. Basic Ideas 


Theorem 3.2.1, there is a neighborhood V Cc R&+"—! of (p;q’) and a continu- 
ously differentiable function y : V — R such that ¥(p; q’) = qn and 


falx; y's Wx; y’)] =0 (3.22) 
holds for (x; y’) € V. 
Notice that, if (3.22) is differentiated with respect to yj, 1 < j <n —1, then 
we find that 
fa, fn BV _ 


0. (3.23) 
Oyj — 9Yn Oyj 
Evaluating (3.23) at x = p, y’ = q’, and using (3.16) and (3.20), we see that 
d ' 
(ps q') =0 (3.24) 
yj 
holds for j = 1,2,...,n—1. 
Now, for each i = 1,2,..., — 1, define the function h; by setting 
Nj (X1, 25-664 X03 Vs Y20- ++ Yn) = Silas y's WO y’)- (3.25) 


Consider the system of equations 
hj (x1, x2, ---, X03 Vis Y2n---s Mn—-1) =O, §=1,2,...,n—1. (3.26) 
For j = 1,2,...,n — 1, by (3.24), we have 


dh; n Off Of; Oy Of; 
5 (P39) = =—(P3 9's Qn) + —(p: 9's Gn —(p3. 9’) = —(p: 3.27 
ay) Pq yj P3934n Bn (Pp: q i, (pq) By, (p; gq) (3.27) 
and, accordingly (by (3.18)), 
ohy dh, dhy 
dy Oy2 On 
dh2 dh2 dh2 
det dy Oy. On 
Olin-1) Ohn_} hn) 
dy) dy2 _ OYn—-1 
af af afi 
ayy Oy2 Yn 
af. ahs afe 
= det dy Oy2 ss yn | 0. (3.28) 
Ofn-1 Ofn-1 Ofn—1 
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Here the partial derivatives in the left-hand determinant are evaluated at (p; q’) 
and those in the right-hand determinant are evaluated at (p; q). 

By induction, there exist a neighborhood U’ of p in R® and continuously dif- 
ferentiable functions ¢; : U'’ —> R such that $j(p) = qj and 


hj (x; $1 (x), 2(x), ---, dn-1(X)) =0, i =1,2,...,n—-1, (3.29) 


hold for x € U’ 
Set P(x) = (x, f(x), ---, Or—1(x)) and 


U=U'Ne (Vv) (3.30) 

and define ¢, : U — R by setting 
Pn(x) = WIx; di (x), b2(x), --- bn—1x)). (3.31) 
By the definition of the h; (equation (3.25)) we see that the desired equations 
(3.19) hold. O 


3.3. The Classical Approach to the Implicit 
Function Theorem 


The development of the implicit function theorem can be traced by looking at 
some older textbooks. Initially the implicit function theorem is treated as a the- 
Orem about a function of two real variables. The nondegeneracy condition in the 
variable “‘to be solved for” is traditionally formulated as a monotonicity hypothe- 
sis (this can be seen for example in Hobson [Ho 57; Section 38]). This approach 
provided for a simple statement and proof of the theorem, but ignored the behav- 
ior of the Jacobian determinant which is the crux of the matter. It was, of course, 
Dini who realized how to formulate the result in the context of several variables 
and who used the Jacobian determinant to provide the correct nondegeneracy hy- 
pothesis. Dini’s proof was inductive (see Section 3.2) and (one may feel) not as 
revealing as Our more modern proofs. 

It is appropriate to begin our detailed discussion of the implicit and inverse 
function theorems with a review of the Jacobian matrix, the Jacobian determinant, 
and their role in the calculus. Let U, V be open subsets of RY andletG: U > V 
be a C! mapping. We write G(x) = (g1(x),.--, gv(x)). If p € U, then the 
Jacobian matrix of G at p is 


081 081 9g 
ax,“ ax ai (p) 
02, 982 9g2 
DG(p) =| ax (p) Ox axN (p) 
a a 
Gees. CEN py: dan eM 
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The Jacobian matrix plays the same role in calculus of several variables as the 
first derivative (a 1 x 1 matrix) does in the calculus of one variable. In particular, 
the C! mapping G is well approximated near p by its Jacobian matrix DG(p). 
More precisely, the change in G, that is, G(p + v) — G(p), is well approximated 
by the vector v left-multiplied by DG(p). In order to recall the calculus concept 
of “differential,” we use the letter “‘D” to denote this matrix. 

In the context of the inverse function theorem, we consider the Jacobian deter- 
minant, which 1s 


dg1 021 dg1 
Ox] (p) 0x2 (p) OXN (P) 
022 022 022 
= pitas ion. 205) 
det DG(p) =det | 4x) (P) 5,5?) axy 
OgN dgNn den 
<5 (p) ——(p) --* =—(p) 
Ox) 0x2 OXN 


The basic result, as we shall see below, is that when det DG(p) ¥ 0, then the 
restriction of the mapping G to a small neighborhood of p is invertible. 

The Jacobian matrix represents the aggregate of information about the first- 
order behavior of a function near a point. As such, we apply the matrix to a (not 
necessarily unit) vector v to obtain information about a directional derivative in 
the direction of that vector. Then we denote the directional derivative of the func- 
tion G in the direction v at the point p by 


(DG(p), Vv). 


This notation is to be read as the matrix DG(p) applied to the vector v using 
ordinary matrix multiplication. 

We note that there are many different notations for the fundamental idea of Ja- 
cobian. A number of sources denote the Jacobian matrix of the mapping G by 
Jac G. The Jacobian determinant is then det Jac G. Other references will denote 
the Jacobian determinant by |Jac G|. Still other works use the word “Jacobian” to 
mean the Jacobian determinant. In the present book, we use “Jacobian” to mean 
the Jacobian matrix (denoted DG) and “Jacobian determinant”’ to mean the de- 
terminant of the Jacobian matrix (denoted det DG). An excellent reference for 
the concepts of Jacobian and Jacobian determinant, from the point of view of the 
calculus, is [Fl 77]. 

For the implicit function theorem, we do not consider equidimensional map- 
pings. Therefore we are forced to look at the Jacobian in the variables for which 
we wish to solve. A useful notation in this context involves the components of a 
function, say 


D(x) = D(y,...,xN) = (G1 (%1,..-.4N),--- Ou (r1,---,XN)), 
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a choice of some M arguments of the function, say 
Nis ane Mas 


and expresses the appropriate Jacobian determinant in the form 


oo, 8d1 8G 
Oxi, OXi, OXiny 
0¢2 d¢2 d¢2 
0(¢1,....6m ) Sa er eee 
——#@—— = det Ox; OX; 9x: 
O(Niys + + Nig) S a wf “tm 
dom dou a dd 
OX; 1 OX; 2 Ox im 


A standard formulation of the implicit function theorem is this: 


Theorem 3.3.1 (The Implicit Function Théorem). Let 


P(x) = O(x1,...,¥N) = (G1 (01, ---, ¥N), «+s OMT, + XN)) 


be a mapping of class C*, k > 1, defined on an open set U © RN and taking 
values inR™ We assume that 1 <M <N.SetQ=N-—M. 

Let x® = (x? eben x®) be a fixed point of U. Of course we let x = (x1, ...,XN) 
be any point of U. Set 


Xu = (x1,---,XQ) and My ys scat Xo): 


We suppose that 


0(p1,---. dm) 


5°) #0. (3.32) 
O(XQ41,---,XN) 


neue there exists a neighborhood U of x°, and open set W C R@ containing 
xe, and functions f,,..., fr of class C* on W such that 


P(x1,---,XQs filXa), +--+ fu (%a)) = 0 for every Xu € W. (3.33) 
Furthermore fi,..., fu are the unique functions satisfying 


{x € U : (x) =0} 
= {x €U : xy EW, xone = felxu) for€ =1,..., M}. 


As a companion result, we now formulate (in consistent notation) the inverse 
function theorem. 


Theorem 3. 3. 2 (The Inverse Function Theorem). Let W C R2 bean open set 
and let ae W —> R2 bea mapping of class Chk>1. 

Let x° be a fixed point of \ W, oe assume that det DG(x°) % 0. Then there 
exists a neighborhood W © W of x° such that 
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l. The restriction G le is univalent; 
2. The set V = G(W) ts open; 


3. The inverse G~ of G | y is ofclass C k 


We now present a classical proof of our two main theorems. More precisely, we 
shall prove that the inverse function theorem implies the Implicit Function Theo- 
rem (the converse is trivial). Then we shall prove the inverse function theorem. 
By “classical” here, we mean an argument that uses only calculus and elemen- 
tary estimates. The reader is encouraged to compare and contrast this proof with 
the Banach space proof that is given in Section 3.4. After we present the classical 
ideas, we shall see what consequences may be drawn from both arguments. 


Proof that the Inverse Function Theorem Implies 
the Implicit Function Theorem 


Our mapping © is at least C'; hence the Jacobian determinant (3.32) of ® is 
continuous. So there is a neighborhood Uo of x® on which this Jacobian does not 
vanish. 

Let us consider the transformation 


G:Up — RN 


given by 
G(x) = (%1,---,x9Q, $1(x),---, Ou (x)). 


Then of course G is a mapping of class C*, just as is ®. Its Jacobian matrix is 


l On as: 0 0 ten 0 

0 — 0 0 as 0 

0. 0° sx 1 0 oh 0 
0) /Ox) **+ Obi /OxQ Ogi /OxQ41 +--+ 91 /Axy 
dm /Ox) "+ Obm/8xQ Ady /OxQ41 --- Ody /dxy 


Obviously the determinant of this matrix is just the determinant of the M x M 
block in the lower-right-hand corner. Since this is merely the Jacobian in (3.32), 
we see that det DG(x) # 0 for every x € Up. 

By the inverse function theorem, we may now conclude that there is a neigh- 
borhood W of x® such that G(W) is an open set and the restriction Gly has an 
inverse G—! of class C¥. 


PX - ee ee ee ee ee 
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Now let us write (x,,0) = (141, ...,x@9,0,..., 0). We set 
R = {xy : (%y, 0) € G(W)}. 


Since G(W) is an open subset of R%, R is also open. Now, for every x, € R, we 
let 


Se(xa) = gore(Xu, 0), €=1,...,M. 
For x € W. &(x) = 0 if and only if x, € R and G(x) = (xu, 0). Since G|,,, and 


G—! are inverses, G(x) = (Xy, 0) if and only if x = G—!(x,, 0). This completes 
the proof. Oo 


Proof of the Inverse Function Theorem 


This proof is considerable work, and we divide it into three steps. The argument 
follows the one that appears in Fleming [Fl 77]. 


Step 1: The mapping G is locally univalent. 
Fix an arbitrary point ? € W. Let h denote the inverse matrix of DG, and let |{hl| 
denote the norm of h considered as a linear operator on R2. Define 


l 
“~ ThOl 


Let L denote the Jacobian matrix DG of G at ?, and set G(t) = G(t) — L. Then, 
fors,t € W, 


G(s) — G(t) = L(s) — L@) + [G(s) — G@)].. 


But - = 
|G(s) — G(t)|/|s —t| > 0 ass,t—>0O. 


Therefore, for € > 0, 
IG(s) — G(t)| = IL) —L@]|—e- |s —¢] 
provided that s is close to t. But, plainly, 
IL(s) —L()| = c-|s —2I. (3.34) 


This shows that 
|IG(s) — G(t)] = (c —©)|s —¢]. 


If we take € = c/2, then we find that 
|G(s) — G(t)| = [c/2] - ls —¢]. 


Thus G(s) = G(t) implies that s = ¢, and we see that the mapping G is locally 
one-to-one on W. 
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Step 2: The set G(W) is open. 
Set V = G(W). Let x* be any point of V We show that x* has a neighborhood 
that lies in V 

By Step 1 we may choose t* € W such that G (¢*) = x*, where W is a neigh- 
borhood of t* on which G is univalent. Let W* be an open ball about ¢* whose 
closure lies in W, and let S denote the boundary of W*. Univalence now implies 
that x* ¢ G(S). Of course the image under G of S is compact. Let 


o* = —dist(x*, G(S)). 


1 


Let V* = B(x*,o*). 
Now fix an arbitrary point x € V* Then, for every # € S, 


20* < |x* —G(t)| < |x* —x]+ |x —G(d)]. 
Since |x* — x| < o*, we see that 
o* < |x —G(t)| 


for everyt € S. 
For t € W, define 


Q 
F(t) = |x -G)P = obj — gO, 


j=l 


where the gj; are the components of G. Then F is of class C* and must have a 
minimum on the compact Q-ball W*. But 


F(t*) = |x —x*|? < [o*]? 


and 
F(t)>[o*}’ foreveryreS. 


It follows that the minimum value of F on W* is less than [o*]* and hence must 
occur at some (interior) point ¢ of W*. Hence the partial derivatives of F at f must 
be 0. 
Since 

—F() =-2 ee (Oey 

Bk = 2. j—8iO)5 7 8i ) 
(0/0t, is used because x is fixed and ¢ is variable), we have (setting cj = xj — 
8j(t)) 


) 


en ae oe oe ee a 


Q 
7) Pa 
0=) cj gif), eachk. 
“6 x 8h) a 
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Of course det DG (ft) % 0, hence the column vectors 
dg) ) 
(se ~(),.. 22). pS Viieed, 


are linearly independent. We conclude that cj = 0, j = 1,..., Q, and hence that 
x=G (r). 

We have proved that if x € V*, then x = G(f) for somef € W*, orx € 
G(W*). Thus V* C G(W*) C V, so V = G(W) is open. 


Notation. For use in the remainder of the proof, we fix a neighborhood W of x® 
on which G is univalent and on which det DG never equals 0. 


Step 3: The mapping (G Pe is of class C!. 
First notice that (G| a exists, by the local univalence established in Step 1. Let 
x* € V andt* = Gly) (x*) as in the preceding step. Set L* = DG(t*). We 
now show that G~! is differentiable at x* and that DG—! (x*) = (L*)~!. 

Set @ = 1/J|(L*)-'|. For any € > O, there is a ball B = B(t*,r*) C W such 
that 


IG) — Gr) -L*¢—*)| < Ke —1*| (3.35) 


for every t € B. Here the constant c is as in (3.34) in Step 1. 

By Step 2, there is a neighborhood B* = B(x*,s*) such that B* C G (B). Let 
x € B* Thenx = G(t) forsomet € B. Since x* = G(t*), we find from Step 1 
that 


sit —t*| <|x—x"]. (3.36) 


Further, since t = G~! (x) and r* = G7! (x*), we see that 
L*[G7!(x) — G(x") — 4 '@ - x") = 16 - Ge") - Le - PY. 
Since C|w| < L*(w)| for every w € R2, we find that 
21G7!(x) — G7 x*) — 4 & — x*)1 5 IG) — G*) — L*?( — x*)I. 
Thus (3.35) and (3.36) yield, for every x € B*, that 
IG" (x) — Ga") —[L*'@ — x") S elx —x"1. 


We see therefore that G~! is differentiable at x* and DG—!(x*) = L7!. 
To summarize, the mapping G —! is a differentiable function and 


DG~'(x) =[DG(G~'())]' (3.37) 
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for every x € V. Of course it then follows that G —! is continuous. Since each 
dg; /Axj is a continuous function, the composition 


is also continuous. It follows then from (3.37) (and Cramer’s rule) that the partial 
derivatives 
a(G—"]j 


Oxj 


are all continuous. As a result, the mapping G —! is of class C!. 

If now G is of class C”, then each 0g;/0xj is of class C”—! and therefore 
[dg;/dxj]o G—! is of class C”"—'. As a result, [/ax}IG~")j is of class C”—! 
and so G~! is of class C”. Inductively, we find that if G of class C* then so is 
Gz! That completes the proof of Step 3. 


Steps 1, 2, and 3 taken together complete our proof of the inverse function 
theorem, and therefore of the implicit function theorem. O 


3.4 The Contraction Mapping Fixed Point Principle 


Let X be a complete metric space with metric p. A mapping F : X — X is called 
a contraction if there is a constant 0 < c < 1 such that 


P(F(x), F(y)) < ¢: p(x, y) 


for all x, y € X. The fact that c < 1 tells us that the image of a set under F is 
contracted—the points in F(X) are closer together than are their pre-images in 
X. The basic theorem about contraction mappings is as follows. 


Theorem 3.4.1 (Contraction Mapping Fixed Point Principle). Let F : X > X 
be a contraction of the complete metric space X. Then F has a unique fixed point. 
That is, there is a unique point x9 € X such that F (xo) = xo. 


Proof. Let P € X be any point. We define a sequence inductively by 


x1 = P 
x2 = F(x1) 
xj = F(xj-1). 


We claim that {xj} is a Cauchy sequence in X. Suppose for the moment that this 
claim has been proved. Then, because X is complete, there is a limit point x9 of 
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the sequence. Furthermore, 
F(xo) = F( lim xj) = lim FQy) = lim xj41 = x0. 
joo joo joo 


So xo is certainly a fixed point for the mapping F. If x9 were another fixed point, 
then we would have 


P(x0, X0) = P(F (x0), F(X0)) < ¢- p(x0, Xo). 


Since 0 < c < 1, the only possible conclusion is that p(xo, Xo) = O or xo = Xo. 
That establishes the existence and uniqueness of xo. It remains to prove the claim. 
We calculate, for j > 1, that 
P(xj,xXj-1) = P(F(xj-1), F(xj-2)) < ¢- p(xj-1, Xj-2) 
c+ p(F(xj-2), F(xj-3)) $c? - p(xj—2, xj-3) 
-<c!—!. p(x, x0). 


IA 


As a result, 


P(X jks Xy) SS P(X pres Xjae—1) + PM jen—1s Xjae—2) #-- + + P(CXj41, Xf) 
< [cit 4 ith? 4... + cA] p(x, x0) 


< ci. 


1 
P(x1,X0)- 
l-c 


In particular, if € > O and if 7 is large enough then, regardless of the value of 
k>1, 

P(Xjoks Xj) <€. 
Thus the sequence {xj} is Cauchy, and the claim is proved. O 


In fact we shall need in practice a slight variant of the contraction mapping 
fixed point theorem. We now state and establish this result. 


Proposition 3.4.2 Let B = B(p,r) be a closed ball in a complete metric space 
X. Suppose that H : B — X is a contraction (with contraction constant c, 
0 <c <1) such that p(H(p), p) < (l—c)r. Then H has a unique fixed point 
in B, 


Proof. If x € B is any point, then 
p(H(x), p) < pC(H(x), H(p)) + eC A(p), p) 
< c-p(x,p)+(l—-c)r 
< er+(-cjyr=r 
This inequality verifies that the image of H lies in B, hence H : B — B. Theo- 


rem 3.4.1 now applies with X replaced by B. The proof is therefore complete. 
oO 
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Proposition 3.4.3 Let B = B(p,r) be an open ball in a complete metric space 
X. Assume that H : B -— X is a contraction (with contraction constant c, 0< 
c < 1) such that p(H(p), p) < (l—c)r. Then H has a unique fixed point in B. 


Proof. Simply restrict H to a slightly smaller closed ball D that is concentric 
with and contained in B. Apply Proposition 3.4.2. O 


Proposition 3.4.4 Suppose that H is a contraction (with contraction constant c, 
0 <c < 1)on the complete metric space X. Let x € X, and suppose that 
p(H(x), x) = d. Then the distance from x to the fixed point p (guaranteed to 
exist by Theorem 3.4.1) is at most d/(1 — c). 


Proof: Let B denote the closed ball in X with center x and radius r = d/(1 —c). 
Apply Proposition 3.4.2 to the restriction of H to B. Thus the fixed point lies in 
B, and we are done. oO 


Proposition 3.4.5 Let X be a complete metric space, and S any metric space. 
Suppose that H : S x X -—> X. Assume that H(s, x) is a contraction in X unt- 
formly over s € S (with uniform contraction constant c, 0 < ¢ < 1), that is, 


O(H(s,x), H(s, y)) Sc: p(x, y) 


for alls € S and all x, y € X. Further, assume that H is continuous in s for 
each fixed x € X. For each s € S, let ps € X be the unique fixed point satisfying 
H(s, ps) = ps. Then the map s +> ps is a continuous function of s. 


Proof. Choose t € S. Let e > 0. By the continuity of H in the first variable, 
choose 6 > O so that if p(s,t) < 5, then p(H(s, p,), H(t, p,)) < €. Since 
H(t, pr) = pr, this last inequality says that the contraction with parameter value 
S$ moves p, a distance at most €. Thus p(ps, p,) < €/(1 —c) by Proposition 3.4.4. 

That is, p(s, t) < 6 implies that p(p;, pr) < €/(1 — c). We conclude that the 
mapping s — ps is continuous at t € S. 0 


Now we draw together Propositions 3.4.3 and 3.4.5 into a single result which 
will be the one that is used to establish the implicit function theorem (Theo- 
rem 3.4.10). The reader should bear in mind that this next theorem is just a 
slight variant of the basic contraction mapping fixed point result given by Theo- 
rem 3.4.1. Because we have taken the trouble to do some elementary preliminary 
manipulations with the contraction mapping theorem, our proof of the implicit 
function theorem will therefore be short and elegant. 


Theorem 3.4.6 Let X be a complete metric space, and S any metric space. Let 
B = B(p,r) be an open ball in X. Let H be a mapping from S x B to X which is a 
contraction in X uniformly over s € S (with uniform contraction constant c, 0 < 
c < 1); further, suppose that H is continuous in the s variable for each fixed value 
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of x € X. Finally assume that, for each s € S, we know that p(H(s, p), p) < 
(1 —c)r. 

Then, for each s € S, there is a unique py, € B such that H(s, py) = ps; 
furthermore, the mapping s +> ps is continuous from S to B. 


Proof. The result is immediate from Propositions 3.4.3 and 3.4.5. O 


Now the implicit function theorem, formulated in a manner that lends itself to 
proof by the contraction mapping fixed point principle, will be our next major 
result. We first introduce some definitions and notation. 


Notation 3.4.7 (Landau) Fix a in the extended reals, that is, a € RU {+00}. 
Suppose g is a real-valued function on R U {00} that does not vanish in a.neigh- 
borhood of a. For a real-valued function f defined in a punctured neighborhood 
of a, we say f is little “‘o” of g as x — a and write 


f (x) = o(g(x)) asx > a 


in case 


Definition 3.4.8 Let X, Z be normed linear spaces. Let x € X, and suppose that 
U is a neighborhood of x in X. A mapping F : U — Z is differentiable at x if 
there is a linear operator 7 from X to Z such that 


F(x + &) — F(x) = T(E) + o(flé Il) 
for all small € € X. 


Of course this definition, commonly known as the “Fréchet definition” of deriva- 
tive, simply says that the function F may be well approximated at x by the linear 
map T. It is straightforward to check that, if T exists, then it is unique. We call T 
the derivative of F at x. The reader may verify as an exercise that this definition 
is consistent with the standard one for differentiability in finite dimensions. 


Definition 3.4.9 If X, Y, Z are Banach spaces and F : X x Y — Z is a mapping 
then, for each fixed x9 € X, we may consider the differentiability of the mapping 


Y> yr F(x, y)- 
If the derivative exists at a point yo € Y, then we denote it by d2F (x0, yo). 


Now we have the implicit function theorem, formulated so that it can be proved 
with the contraction mapping fixed point principle: 


Theorem 3.4.10 Let X, Y, Z be Banach spaces. Let U x V be an open subset of 
X x Y Suppose that G : U x V — Z is continuous and has the property that 
dG exists and is continuous at each point of U x V. 
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Assume that the point (x, y) € X x Y has the property that G(x, y) = O and 
that dzG(x, y) is invertible. 

Then there are open balls M = Bx(x,r) and N = By(y,S) such that, for each 
¢ € M, there is a unique n € N satisfying GC, n) = 0. The function f, thereby 
uniquely defined near x by the condition f({) =n, ts continuous. 


Proof. Let T = d2G(x, y) and, fora € U, B € V, set 
L(w, B) = B — T~'[G(a, B)] 


Of course T is invertible by hypothesis. 
It follows by inspection that L is a continuous mapping from U x V to Z such 
that 


Lx, y)=y-T"[G.y)] =y-T'O=y. 


Also L is continuously differentiable in the second entry and 
dL(x, y) = id — T~! o[d2G(x, y)] =0. 


Since d2L(-, -) is a continuous function of its arguments, there is a product 
of balls M x N (with M = B(x,r) and N = B(y,s)) about (x, y) on which 
ld2L(-, )I| is bounded by 1/2. We may also assume, shrinking the ball M if 
necessary, that ||L(a@, y) — y|] < s/2 fora € M. 

The mean-value theorem applied to L in its second variable (parametrize with a 
segment) now implies that L(a, -) is a contraction, with constant 1/2 (this state- 
ment is true uniformly in a € M). By Theorem 3.4.6, we conclude that for each 
¢ € M there is a unique n € N such that L(t, 7) = n; moreover, the mapping 
¢ +> n is continuous. Since L(t, 7) = n if and only if G(f, 7) = 0, the result is 
proved. O 


Remark 3.4.11 The use of the contraction mapping principle in proving the im- 
plicit function theorem is an intellectual descendent of the iterative proof first 
given in Goursat [Go 03]. Edouard Goursat (1858-1936) was inspired in turn by 
Charles Emile Picard’s (1856-1941) iterative proof of the existence of solutions 
to ordinary differential equations. 


3.5 The Rank Theorem and the Decomposition 
Theorem 


The rank theorem is a variant of the implicit function theorem that is tailored to 
situations in which the Jacobian matrix of the mapping under study is of constant 
rank but not full rank. Consider, for instance, the example F : R? — R? given by 


F(x,y,z) =? +yt+zy? + y,x>—y? +2). 


-~ a 
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Then the Jacobian of F is 


3x? 1 l 
A=| 0 3y*4+1 0 
3x7 —3y* 
Since the first and third columns of A are dependent, the rank of A is everywhere 
less than or equal to 2. On the other hand, the first and third rows of A are in- 
dependent. Thus A has rank 2 at every point. If b € R° is a point in the range 
of F—for specificity, let us say that b = (30, 10, 20)—then we see that F~!(b) 
is a 1-dimensional set. For example, the point (3, 2, 1) lies in F—!(b) and is an 
element of the variety 


F-!(b) ={(*%,2,2) 22S —x> 4 28}. 


This is a nonsingular curve of dimension 1. The geometric setup for other level 
sets is similar: In effect, the level sets of the mapping F foliate the domain of F. 
Obversely, if we look at the image of F, then we see that it is a smooth 2- 
dimensional manifold given by {(x, y,z):z=x-— y}. 
The rank theorem formalizes the observations that we have made for this spe- 
cific mapping F in the context of a class of mappings having constant rank. 


Theorem 3.5.1 (The Rank Theorem). Let r, p, q be nonnegative integers and let 
M=R'tP, N =R'+4, Let W C M bean open set and suppose that F : W > 
N is a continuously differentiable mapping. Assume that DF has rank r at each 
point of W. 

Fix a point w € W. There exist vector subspaces M,, M2 © M and N,, N2 © 
N such that M = M,; + M2, N = Ni + N2, dimM, = dimN, = r, each 
m € M has a unique representation m = m, + m2 with mj € Mj, eachn € N 
has a unique representation n = n, + n2 with nj € Nj, and with the following 
properties: Set F(x) = F(x) + F2(x) with F(x) € N; and F2(x) € N2 for each 
x € W. Then there is an open set U € W with w € U such that 


(1) F,(U) ts an open set in Nj; 
(2) For each n, € F\(U) there is precisely one nz € N2 such that 


n; +n2€ F(U). 


We see that the rank theorem says in a very precise sense that the image F(U) 
of F is a graph over F(U), and can thereby be seen to be a smooth, r-dimensional 
surface (or manifold). The corresponding statements about the dimension and 
form of the level sets of F follow from a bit of further analysis, and we shall 
save those until after our consideration of the theorem. 

As with many theorems in this subject, the continuously differentiable version 
is modeled on a rather transparent paradigm that comes from linear algebra. We 
first treat that special instance: 
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Lemma 3.5.2 Let p,g.r be nonnegative integers and suppose that M,N are 
vector spaces with dimensions r + p andr + q, respectively. Let A: M —> N 
be a linear transformation, and assume that the rank of A is r. Then there exist 
vector spaces M,, M2 © Mand Ni, N2 CN such that 


(1) Each m € M can be written in a unique manner asm = Mj} + m2 with 
mj € Mj, j = 1,2; 


(2) Each n € N can be written in a unique manner asn = Ny + n2 with 
nj € Nj, j = 1,2; 


(3) Amz = 0 for each m2 € M2; 
(4) A maps M, to N, ina one-to-one, onto manner; 
(5) dim M, = dim N, =r. 


Proof. The proof is just elementary linear algebra. Let N; be the image of A. 
Select a basis {Vv}, ..., Vr+-q} for N such that {v1,..-, v,} is a basis for Ny (whose 
dimension we know to be r by the hypothesis on the rank of A). Define N2 to be 
the span of {Vr41,---» Vr+g}- 

For j = 1,...,1, choose vectors uj € M such that Au; = vj. Then of course 
{u;,--.,U,} is a linearly independent set, and we let MM, be the span of that set. 
Let M2 be the kernel of the operator A. 

By fiat, property (2) now holds. Property (3) is true by definition, and so is 
property (4). Property (5) is equally obvious. 

If m € M, then there are scalars a,,..., a, such that Am = Djel ajvj. Set 
m) = )0j=1 4juj and m2 = m— my. Then m; € M and Am, = Am. Therefore 
certainly Am2 = 0 so that m2 € M2. 

Now properties (3) and (4) imply that @,; MN M2 = @. Thus the representation 
m = m; + mz is unique, so that (1) is proved. O 


Now we will transfer this basic linear algebra fact to the continuously differen- 
tiable context. 


Proof of Theorem 3.5.1. With A = DF(w), M = R't?, N = R'*4, let the 
spaces M,, M2, Ni, N2 be as in the lemma. Let T = Alu, By part (4) of the 
lemma, 7 is a linear isomorphism of M, onto Nj. 

Since we will use the projections onto all four of M,, M2, Nj, and Nz we 
introduce some notation for these mappings. Let P; be the linear projection of M 
onto M; given by P;(m; + m2) = mj; and let Q; be the linear projection of N 
onto Nj; given by Qj(n; + n2) = nj. With this notation, we have F; = Q;F. 
A = AP; = Q|A = Q\AP;,0 = AP2 = Q2A = Q2AP2, T~'A = Py, and 
AT'Q, = Qh. 

Set 


G(x) = T~! Fy(x) + Pox = T~! Q) F(x) + Pox, xew. (3.38) 
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Differentiating, we find 


DG(w) = T7~'Qi(DF(w))+ Po 
T (QA) + P2 =T'A+ Po = Pi + Po. 
We conclude that DG(w) is the identity mapping on M@. We may now apply the 
inverse function theorem to the function G at the point w. We conclude that there 
is a neighborhood U of w in W and a neighborhood V of G(w) such that G 
is a One-to-one, onto, continuously differentiable mapping of U onto V. Taking 
subsets if necessary, we may assume that V is convex (this hypothesis is only for 


convenience, and is certainly not necessary). 
Set H = (Gly) and define 


O(z)= F(H(z)), 2€EV. (3.39) 
Since A P2 = 0, (3.38) shows that AG = AT~! Q, F = Q)F so that 
Q, F(A (z)) = AG(A(z)) = Az = QU AP\z. 
Therefore 
&(z) = QiAP\z+¢(z2)  2zEV, (3.40) 


where $(z) € N2. 

By (3.39) and (3.40), Q; F(U) is the set of all points of the form Q)A P)z, 
z € V. Since V is open and AN, is the range of A, part (1) of the theorem is 
proved. 

To prove part (2) of the theorem, we need to show that (z) depends only on 
P,z. So fix an element z € V. By (3.39) and (3.40), we know that 


D(z) = DF(H(z)) - DH(z) = Q1AP; + D¢d(z). (3.41) 


Since DH (z) is an invertible linear operator on M, and since DF(H (z)) has rank 
r, we see that the range R of D®(z) is a vector space of dimension r. Since 
the range of D¢(z) is in N2, (3.41) shows that Q)AP; = Q,;(D®(z)). Hence Q; 
maps R into N1; since both of the spaces R and N, have dimension r we conclude 
that Q, : R — N, is also one-to-one and onto. Thus, for z € V. 


Q,AP\h=0 implies DO(z)h=0. (3.42) 
If we now have the setup z € V, hz € M2,z+ ha € V, then we define 
A(t) = O(z + tha), O<r<l. (3.43) 


Because V is convex, this definition is valid and sensible. Since Ah2 = 0, (3.42) 
and (3.43) imply that 


DA(t) = DO(z+ tha)ho =0. O<r<1l. 
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Thus A(1) = A(O) or (z+ hz) = P(z); this is what we needed to prove to 
establish (2). O 


It is worth noting that if n € N is a value of the mapping F and ifm € M 
is an element of F~!(n), then the Jacobian of K(x) = F(x) — n (in the entries 
v},---,V,) with respect to the variables u;,..., ur has maximal rank. Therefore 
the implicit function theorem applies and we see that we can describe the level set 
as a smooth, parameterized p-dimensional manifold. This is often the practical 
significance of the rank theorem (although it is generally not formulated in this 
way). 

It is not difficult to see that Theorem 3.5.1 is also true when the domain R'+? 
and the range R’+4 of the mapping F are replaced by smooth manifolds. We leave 
the details to the reader (on a local coordinate patch, simply map the domain and 
range to appropriate Euclidean spaces). 

We next formulate and prove a version of the inverse function theorem which 
says in effect that a given C! invertible mapping can be factored into elementary 
submappings. We begin with a definition. In what follows, let {e1, ..., en} be the 
canonical orthonormal basis for Euclidean space. 


Definition 3.5.3 Let E C R% be open and F : E — R% a mapping. Assume 
that, for some fixed, positive integer /, 


e; F(x) =e -x 


for allx € E andi # j. This hypothesis simply says that x and F(x) have the 
same i'" coordinate when i # j. In other words, F acts only in the j" coordinate. 
In the terminology of Rudin [Ru 64], such a mapping is called primitive. 


A primitive mapping is a rather specialized object; in fact it appears to be too 
particular to bear much scrutiny. But the decomposition theorem that we now 
present tells us that any C! mapping may be factored into primitive mappings. 


Theorem 3.5.4 Suppose that F isa C' mapping of an open set E C RN into R% 
Assume that 0 € E, F(O) = 0, and det DF(0) # 0. Then there is a neighborhood 
U of Oin RN in which the representation 


F(x) = Gy(By(Gw-1(By-1 --- Gi (Bi (x) ---)))) 


is valid. Here each G ; is a primitive C ' mapping on U, G j(O) = 0, and each B; 
is a linear operator on RN which is either the identity or which interchanges two 
coordinates. 


Proof. The proof proceeds by building on the number of mappings in the fac- 
torization. We will construct a sequence of mappings F,, which come closer and 
closer to satisfying the conclusion of the theorem. The inductive statement (Py) 
will be 
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e The mapping Fy, maps a neighborhood U,, of 0 € R into R’, 
Fin(0) = 0, Fn is of class C!, Am = D Fm(0) is invertible and 


Qe; Fin(§) =e; -& (3.44) 
holds for € Un andl <i <m. 


For m = 1 we set F; = F and note that (P)) is obviously true since there are 
then no e; with 1 <i < m. 

Assume now that the (P,) has been proved for 1 < v < m. We now establish 
(Pina1). Set oj; = e; - Amej. Then (3.44) tells us that aj; = Oifi < m < j. 
If we also knew that o,,; = O for all j > m, then the representation A,ej; = 
>-; %ije; would show that the collection Amem,..., Amen of N+ 1—7m linearly 
independent vectors lies in the span of the N — m vectors @n41,---env; that is 
a contradiction. Thus there is an index j, m < j < N, such that amj 4 O. Fix 
this /. 

Define projection operators P,, such that P,e; = e; if i 4 m and Pen» = O. 
Also define linear operators B,, satisfying Bem = ej, Bre; = em, and Bye; = 
e; fori # j,i 4m. Put 


Gin(X) = PmX + [€m Fin (Bm(X)) Jem - (3.45) 


Then Gm is obviously primitive. Since D( Fin Bm, )(O) = Am Bm, we see by differ- 
entiation that 


DGm(O)h = Pyh+ [em Am Bmhjen for h € RN : 


If DG.,(O)h = 0, then the last line shows that P,,h = 0, so thath = Ae, 
for some scalar 2. But we also know that em - Am Bah = 0, or Adm; = 0 by the 
definition of B,,. Since a,j; 4 0, we see that A = 0, hence h = 0. 

We thus see that DG,, (0) is one-to-one. Therefore it is invertible, and the im- 
plicit function theorem then implies that G,, is one-to-one on a neighborhood U,, 
of 0. Also Gn (Um) = Vin is an open subset of R™. Define 


Fn41(¥) = Fn(BmGiy (Y))s -¥ € Vn- (3.46) 
If y € Vin with y = G,,(x), x € Um, then (3.45) shows that 
On -Y=C@n Fin (BmX) and e; y=e;-x fori <m. (3.47) 
As aresult, (3.44) and the definition of B,, imply that 
ej: Fmai(y) =e; Buk =ex-X =e; -y 
if i < m. Also (3.47) implies that 


Qn Finat(y) =m Fin(BmX) = em + y- 
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We see as a consequence of these calculations that Fm41 satisfies (Pin+1). 
Rewriting (3.46) in the form 


Fin (X) = Fin41(Gm(BmX)) » m=1,2,...,N, 
and noting that Fjy41 is the identity, we finally see that 


F(x) = Fi(x) = F2[Gi(51x)] 
= F3[G2(B2G1(B)x))] 


= Fyai[Gy(By(Gn-i(Bn-1 - +: Gi(B1 (x) ---))))] 
= Gy(Bn(Gn-1(Bn-1---Gi(Bi(x)---)))), 


which gives the conclusion we desire. O 


3.6 A Counterexample 


Example 3.6.1 This example will show that one cannot omit the hypothesis in 
the inverse function theorem requiring the derivative to be continuous. Consider 
f(x) = ax + x*sin(1/x), where O < a < 1. Extend f to x = 0 by setting 
f (0) = 0. This function is differentiable everywhere and f’(0) = a # 0. In fact, 
we compute 


f'(x) =a@ + 2x sin(1/x) — cos(1/x) for x #4 0, (3.48) 


while the derivative at 0 is obtained by directly examining the limit of the differ- 
ence quotient. 

We note that f’ is not continuous at zero, so the hypothesis in the inverse func- 
tion theorem that the function be C! is not satisfied. We will show below that the 
f does not have an inverse function in any neighborhood of 0. 

It is clear from freshman calculus that at any point where f’(x) = O and 
f"(x) # 0, there cannot be a local inverse. We claim that there are infinitely 
many such points in any neighborhood of 0. From (3.48) and the fact that ja] < 1, 
it is clear that there are infinitely many zeros of f’ in any neighborhood of 0. It 
remains to show that such zeros of f’ are not also zeros of f”. We will prove this 
assertion by contradiction. 

We compute 


f(x) = (2 — 1/x*) sin(1/x) — (2/x) cos(1/x) for x #0. (3.49) 
If both f'(x) = O and f"(x) = 0 hold for some x # 0, then the system of 


simultaneous linear equations 
2x$-—-C = -a 
(2 — 1/x?)S — (2/x)C 


w= wP BALA e oe F 


| 
oO 
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has the solution $ = sin(1/x), C = cos(1/x). On the other hand, we can apply 
Cramer’s rule to the linear system to conclude that 


—2x 
a ——.,, 
1+ 2x- 
1 —2x? 
a —~. 
1+2x2 


S = (3.50) 


(3.51) 


If we take S and C to be given by (3.50) and (3.51), then we see that 


> 1+4x'4 


S?4+C? = a* ——_~ 
er a (1 4 2x2p2 


(3.52) 
and, for small nonzero values of x, the right-hand side of (3.52) is not equal to 
1. Thus S and C cannot be equal to sin(1/x) and cos(1/x), respectively, and, 
consequently, x cannot be a zero both of f’ and of f”. 


4 
Applications 


4.1 Ordinary Differential Equations 


There is a strong connection between the implicit function theorem and the theory 
of differential equations. This is true even from the historical point of view, for Pi- 
card’s iterative proof of the existence theorem for ordinary differential equations 
inspired Goursat to give an iterative proof of the implicit function theorem (see 
Goursat [Go 03]). In the mid-twentieth century, John Nash pioneered the use of 
a sophisticated form of the implicit function theorem in the study of partial dif- 
ferential equations. We will discuss Nash’s work in Section 6.4. In this section, 
we limit our attention to ordinary (rather than partial) differential equations be- 
cause the technical details are then so much simpler. Our plan is first to show how 
a theorem on the existence of solutions to ordinary differential equations can be 
used to prove the implicit function theorem. Then we will go the other way by 
using a form of the implicit function theorem to prove an existence theorem for 
differential equations. 

A typical existence theorem for ordinary differential equations is the following 
fundamental result! (see for example, Hurewicz [Hu 64)): 


Theorem 4.1.1 (Picard) If F(t, x). (t,x) € Rx RN, is continuous in the (N +1)- 
dimensional region (to — a, tg + a) x B(x, r), then there exists a solution x(t) 


IThis fundamental theorem is commonly known as Picard'y existence and uniqueness theorem. 
The classical proof uses a method that has come to be known as the Picard iteration technique. See 
[Pi 93). 
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of 


d 
— =F(t,x), X(to) =o, (4.1) 


defined over an interval (to — h, to + h). 


Remark 4.1.2 The solution of (4.1) need not be unique if F is only continuous. 
For example, the problem of finding x(t) satisfying x’ = x2/3, x(0) = 0, has the 
two solutions x = O and x(t) = (¢ /3)° To guarantee that the solution of (4.1) is 
unique, it is sufficient to assume additionally that F satisfies a Lipschitz condition 
as a function of x 


We can give an alternative proof of the implicit function theorem as a corollary 
of Theorem 4.1.1. 


Theorem 4.1.3 Suppose that U C R%+! and that H : U > Ris C! if 
H(to, xo) = 0, (to, x0) € R x RY, and the N x N matrix 


OH; 
Ox, fo x0) 
xj i john 


is nonsingular, then there exists an open interval (to—h, to+h) and a continuously 
differentiable function ¢ : (to — h, t9 +h) > R* such that (to) = xo and 


H(t, (t)) = 0. 


Proof. We consider the case N = 1 in some detail. First, choose a, r > 0 so that 
(to —a,tg ta) x (xo —1r,X9 +r) C U and (0H /dx)(t, x) is nonvanishing on 
(to—a, to+a) x (xo—r, xo +r). Then define F : (to—a, to+a) x (xo—r, Xo +r) > 
R by setting 


oH a 
F(t,x)=- stn / att x): (4.2) 


Since F is continuous, we can apply Theorem 4.1.1 to conclude that there exists 
a solution of the problem 


dx 
at = F(t, x), x (to) = x0 


defined on an interval (f9 — h, to +h). We define @ : (to —h, t9 +h) —~ R by 
setting 


p(t) = x(t). 
Note that 


(to) = xo (4.3) 
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and 


¢'(t) 


dx 
7 = F(t, x(t)) 


oH 0H 
By (4.3), we have H (to, 6(to)) = H(to, xo) = 0, and by (4.4), we have 


d 0H _. 0H 


Thus we have H(t, 6(t)) = 0 on the interval (t9 — h, t9 +h). 


In case N > 1, we choose a,r > 0s0 that (t9 — a, t9 +a) x B(xp, r) C U and 


so that the N x N matrix 
a A 
8x5] 5 ja1.2. ...N 


is nonsingular on (to — a, fo + a) x B(xo, r). Next, we replace (4.2) by 
-1/0H 
F(t,x) = —[DeH( + to, x + x0)| (Fe + to, x +10) . 
The proof then proceeds as before. O 


Remark 4.1.4 The proof of Theorem 4.1.3 given above is clearly limited to the 
case of one independent variable in the implicitly defined function. The case of 
one dependent variable and several independent variables can be obtained by re- 
placing (4.2) with the appropriate system of first-order partial differential equa- 
tions. The system of partial differential equations is solved by applying the exis- 
tence theorem for ordinary differential equations (Theorem 4.1.1), with parame- 
ters, to each independent variable in turn. For example, if we have the equation 
H(x, y, z) = 0 which we are considering near a point (xg, yo, Zo) Where H is zero 
and 8H /dz is nonzero (so the implicit function z(x, y) will involve two indepen- 
dent variables), then there will be two first-order partial differential equations that 
z(x, y) must satisfy: 


dz dH /[ 90H 

seh ied aks ieee ; 4, 

ax -/ dz a) 
0H {90H 

Oe ee Sees, (4.6) 

dy dy dz 


Restricting to a neighborhood of (x0, yo, Zo) in which 0H/dz is non-vanishing 
will enable us to conclude by an appeal to Rolle’s theorem that the function 
z(x, y) is uniquely defined, without the hypotheses for the uniqueness of solu- 
tions of ordinary differential equations. 
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The second equation, (4.6), is solved by solving the first equation, (4.5), with 
y = yo fixed and with the initial condition z = zg. Then the resulting function 
z(x, yo) is used to provide the initial condition for (4.6); the initia] value problem 
is then solved while treating x as a parameter. This process will produce a solution 
of (4.6) in an open set about the point (x9, yo). By carrying out the same process, 
but in the other order, we can obtain a solution of (4.5) in an open set about 
the same point (xo, yo). Because of the form of the right-hand sides of (4.5) and 
(4.6), those two solutions will be consistent and will define just one function that 
satisfies both equations. 

Finally, the general implicit function theorem for any number of dependent 
and independent variables can be proved using Dini’s induction procedure (see 
Section 3.2). O 


We have seen that the implicit function theorem can be treated, in a sense, as 
a corollary of the existence theorem for ordinary differential equations. What we 
would like to do next is prove the converse: that we can use the implicit function 
theorem to prove the existence of solutions to ordinary differential equations. The 
Banach space methods of Section 3.4 will be required for this argument. We recall 
the statement of the theorem: 


Theorem 4.1.1 If F(t, x), (t,x) € Rx RY, is continuous in the (N + 1)- 
dimensional region (to — a, t9 + a) x B(x, r), then there exists a solution x(t) 


of 
dx 
Pris F(t,x), x(to) = Xo, (4.1) 


defined over an interval (t9 — h, tg +h). 


Proof. For convenience of notation, let us suppose that to = 0. 

Let Bo be the space of bounded continuous R‘ -valued functions on (—a, a), 
normed by the supremum of the magnitude of the function. Let B, be the space 
of bounded continuously differentiable R‘ -valued functions on (—a, a) that also 
have a bounded derivative. We norm this space by the sum of the supremum of the 
magnitude of the function and the supremum of the magnitude of the derivative 
of the function. We define a map F : By > By x R by Setting 


F[xt)| = Exo — F(t, x(t)), x(O) — xo |. 


With this notation, a solution of (4.1) is given by a zero of F. 
We imbed the problem of solving F[x] = [0, 0] into a larger problem Define 
H : (0, 1] x By + Bo x R by setting 
Hl, x(x)| = [x"(x) ~aF (at, X(r)), X(0) — xo. 


We observe that 


H[O, xo] = (0, 0], (4.7) 
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where xo in (4.7) represents the constant function. Also, we observe that the 
Fréchet derivative of 1 at (0, xo] is given by X + X’. It follows from the implicit 
function theorem for Banach spaces, Theorem 3.4.10, that for all small enough 
choices of @ there exists an X (a, tT) such that 


D,X(a,t)—aF(at, X(a,t))=0, X(a,0) = x0. 
For such an a > 0, we define x(t) by setting 
x(t) = X(a, t/a). 


It follows that 
1 
x(t) = D,(a, t/a) = - -a + Fla(t/a), X (a, t/a)) = F(t, x(t)). 


Thus our differential equation is solved, and the theorem is proved. O 


4.2 Numerical Homotopy Methods 


Suppose we wish to solve a system of nonlinear equations 
F(x) =0 (4.8) 


where F : RY — RN is smooth. Only in very special circumstances will it 
be possible to solve (4.8) in closed form; generally, numerical methods must be 
employed and an approximate solution thereby obtained. Of course, we would 
probably like to apply Newton’s method, but for that we need a reasonable starting 
point for the iteration. In case we do not have such a reasonable starting point for 
Newton’s method, some alternative procedure is needed. One such method is the 
homotopy method (also called the continuation or imbedding method). 

In the homotopy method, we imbed the problem of interest, (4.8), into a larger 
problem of finding the zeros of a function H : Rt! -+ R” However, the 
function H is to be specially chosen so that the function Fo : RY — R% defined 
by setting 


Fo(x) = H(0, x) (4.9) 


is one that we understand well, while the function F in which we are interested is 
given by 
F(x) = H(1,x). 


The plan then is to follow the zeros of H from a starting point (0, x9) € Rt 
with Fo(xo) = 0 along acurve (t(s), x(s)), 0 < s < 1, for which 


H(t(s), x(s)) = 9, (4.10) 
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(a) (b) 


Figure 4.1. Nice Continuation 


to a point (1, x;) = (¢(1), x(1)) where we will have F(x1) = 0. This will solve 
the original problem (4.8). Two standard choices for H are the convex homotopy 
defined by setting 

H(t, x) = (1 —¢)Fo(x) +t FQ) 


and the global homotopy defined by setting 
H(t, x) = F(x) — (1-1) F(x), 


where xo can be any convenient value. 

The picture we would like to see for the curve (t(s), x(s)) along which (4.10) 
holds should resemble that in Figure 4.1(a). It would be even better if the curve 
resembled that in Figure 4.1(b), because in that case we could parameterize the 
curve by ¢ itself. On the other hand, it is conceivable that the solution set of 


H(t,x)=0 


might look like that in Figure 4.2 where, starting from a zero of the form H (0, xo), 
we can never arrive at a zero of the form H(1, x;). Notice that there are four types 
of bad behavior for {(t, x) : H(t, x) = O} in Figure 4.2: (1) A curve starts at t = 
0, but doubles back without ever getting tot = 1, (2) acurve becomes unbounded 
in x, (3) a curve reaches a bifurcation point where curves cross, and (4) a curve 
comes to a dead end where it cannot be continued. All of these instances of bad 
behavior are possible; nonetheless they all can be ruled out by imposing some 
simple hypotheses and applying the implicit function theorem. 

To illustrate the ideas, we first state a theorem in which we can show that the 
curve H(t(s), x(s)) = 0 has the nice form shown in Figure 4.1(b). 


Theorem 4.2.1 Let U be an open subset of R' . Suppose that H is continuously 
differentiable in an open Set containing [0, 1] x U, that the function Fo given by 


Fo(x) = H(0, x) 
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t=0 f=] 


Figure 4.2. Bad Continuation 


has a zero xo in U, and that d > 0 is such that 
{x : |x —xol <d} CU. 


If, on all of (0, 1] x U, the matrix 
H; 
D,.H = (=) 
Ox; ];, jHl.2...N 


(D-H D,H| ed 


is nonsingular and 


holds, where D,H is the column vector 
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then there is a continuously differentiable function x(t), defined for 0 <t <1, 


satisfying 
H(t, x(t)) = 0. 
In particular, we have 


F(x(1)) = H(1, x(1)) = 0. 


(4.11) 


Proof. By the implicit function theorem applied at (0, xo), there exists a continu- 
ously differentiable function x(t) defined in a neighborhood of 0 with x(0) = xo 
and satisfying (4.11) on that neighborhood. Thus there exists a positive number fo 


such that 
(1) x(t) is defined on the half open interval (0, fo), 


(2) x(0) = xo, 


68 4. Applications 


(3) x(r) is continuously differentiable on (0, fo), 
(4) H(t, x(t)) =O holds for 0 < t < ft. 


Set 
t* = sup {to : there exists x(t) satisfying conditions (1)-(4)} . 


We claim that 1 < r*, which will prove the result. To see this, note that, by the 
implicit function theorem, we have 


dx 


— =(D,H)"' DH, 
dt 


so that 
Ix’(t)l <d 


holds whenever t < 1 and x’(r) exists and x(t) lies in U. It follows that, if it were 
the case that r* < 1, then x* = lim;4;+ x(t) exists. Now we can apply the implicit 
function theorem again, but at the point (r*, x*), and thus extend x(t) to a larger 
interval. This contradicts the definition of r*. oO 


Example 4.2.2 Consider the equation 
l+x+te*=0. (4.12) 


If we define 
H(t,x)=1+x+te, 


then all the hypotheses of the theorem are satisfied by letting the interval (—2, 0) 
play the role of the set U. Thus there exists a function x(r), defined for 0 < t < 1, 
with x(O) = —1 and 

1+ x(t) +re™ =0. (4.13) 


In particular, x(1) * —1.278465 solves the equation (4.12). The curve is shown 
in Figure 4.3. 


x=-1.0 


x=-1.3 


Figure 4.3. Continuation Curve 
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t=0 f=] 


Figure 4.4. Predictor-Corrector Path 


As was noted in the proof of Theorem 4.2.1, we know that x(t) is the solution 
of the initial value problem 


dx e* 


dt 1+ tex’ 


The observation that the curve satisfies an initial value problem for an ordinary 
differential equation also can be used as a tool in finding x(1) by a predictor- 
corrector method: Imagine we are incrementing tf by a given step-size Ar. The 
differential equation (4.14) can be used to predict a reasonable approximation of 
x(t + At) simply by using an “Euler step” 


x(0) =—1. (4.14) 


Then a correction can be made by using the equation (4.13). With Ar = is this 
predictor-corrector process would produce a curve that is represented schemati- 
cally by the jagged approximation to the smooth curve shown in Figure 4.4. In 
fact, for the particular example (4.13) and for At = ie the correction back to the 
curve in Figure 4.3 is so small as to be imperceptible. O 


The hypotheses of the previous theorem rule out all possible bad behavior and 
insure that the curve (f(s), x(s)) never doubles back in the r direction, so it has 
the simple structure x = x(r). A more general, but still nice, situation is provided 
by the next theorem. 


Theorem 4.2.3 Let U be a bounded open subset of RN. Suppose that H is con- 
tinuously differentiable in an open set containing (0, 1] x U, that the function Fo 
given by 

Fo(x) = H(0, x) 
has a unique zero xo in U, and that H(t, x) # O holds for all (t, x) € (0, 1] x aU. 
If DH(t, x) is of rank N for (t,x) € (0, 1] x U and if the matrix 


oH; 
(Fo. 10)) 
ax; i jHl2...N 
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is nonsingular, then there are continuously differentiable functions t(s) and x(s), 
defined for0 < s < 1, with t(0) = 0, (1) = 1, and such that H(t(s), x(s)) =0 
holds. In particular, we have 


F(x(1)) = A(t(1), x(1)) = 0. 


Remark 4.2.4 The hypothesis that Fo(x) has the unique zero xo rules out the pos- 
sibility that the curve (t(s), x(s)) that starts at (0, xo) and satisfies H (t(s), x(s)) = 
O could ever return to {0} x U. The hypothesis that there are no zeros on [0, 1] x dU 
insures that the curve remains confined to (0, 1] x U. Finally, the hypothesis that 
DH is of rank N guarantees that the set of points (t, x) for which H(t, x) = Ore- 
ally is a curve and that it cannot come to a bifurcation point or to a dead end. Thus 
the curve must emerge somewhere and, since all other possibilities are excluded, 
it must emerge at a point of the form (1, x;) with x; € U. 


Proof. Our main tool for constructing the curve will be Theorem 4.3.1 (to be 
treated later), which gives various equivalent definitions for a smooth surface or, 
in this case, a smooth curve. The proofs in Section 4.3 do not use the results of 
this section. 

We will begin with an arc-length parametrization, and at the end we make a 
change of variables to obtain the parametrization required in the statement of the 
theorem. 

By the hypothesis that DH is of rank N, we can apply Theorem 4.3.1 to con- 
clude that, in a neighborhood of any point (t*, x*) with H(r*, x*) = O, there 
exists a small section of curve satisfying 


H(t(o),x(o)) =0. 


In particular, there is a section of curve through (0, xo). Let us suppose that this 
section of curve is parametrized by arc-length with t(0) = O and x(0) = xo. 


Because 
OH; 
—— (0, xo) 
Ox j 


is nonsingular, we can conclude that t’(0) 0. To see this, note that we have 


ij=,2,...N 


(D,H) t'(0) + (Dy H) x'(0) = 0, 


so 
x'(0) = —1'(0) (D, H)! D,H. 


Thus, if if it were the case that r’/(0) = 0, then the curve (t(o), x(a)) would 
have vanishing velocity, and that is not allowed under Theorem 4.3.11). Making 
a change of variable from o to —o if necessary, we can suppose that r’ (0) > 0. 


By the preceding argument, we see that there exists a positive number og such 
that 
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(1) t(o) and x(c) are defined on the half open interval (0, 09), 
(2) t(0) = O and x(0) = xo, 
(3) t(@) and x(c) are continuously differentiable on (0, oo), 
(4) (t(o@), x(@)) is an arc-length parametrization, 
(5) H(t(o), x(o)) =O holds for 0 < o < op. 

Set 


o* = sup{og : there exist (t(a), x(o)) satisfying conditions (1)—(5) }. 


We claim that 1 < supgeo.g+) (a). If not, we let (r*, x*) be an accumulation 
point of (t(@), x(a)) and apply Theorem 4.3.1 at (t*, x*). We conclude that, in 
a sufficiently small neighborhood of (t*, x*), the set of points (t,x) for which 
H(t, x) = Oisacurve of finite length. Hence o* < ooand the curve (t(o), x(c)), 
0 <o <o%, can be extended past o*. This contradicts the definition of o* 
Finally, we set o1 = inf{o : t(0) = 1} and reparametrize by setting s = o/o}. 


O 


The preceding proof applies equally well with slightly weaker hypotheses, so 
we State that result as a corollary. 


Corollary 4.2.5 Under the circumstances of Theorem 4.2.3, if all the hypotheses 
of that theorem hold except that DH (t, x) is assumed to be of rank N for all 
(t,x) € (0, 1] x UN {(t, x) : H(t, x) = 0} rather than for all (t, x) € (0, 1] x U, 
then the conclusion of Theorem 4.2.3 still holds. 


When, in a particular case, we wish to apply Theorem 4.2.3, we might find that 
it is not true that DH is of rank N in the whole region (0, 1] x U. It might not 
even be true that DH is of rank N on the set 


{(t, x): H(t, x) =0}. 


Nonetheless, if H is C? and we are willing to replace the value 0 € R" by some 
nearby value, then we can arrange for DH to be of rank N on the inverse image 
of that value. This is a consequence of Sard’s theorem. For completeness, we state 
Sard’s theorem next. 


Definition 4.2.6 Let U C R“, and V C R¥ be open sets. If H : U + VisaC* 
function, then we say y € V is a critical value for H if there exists x € U with 
H(x) = y and for which the rank of the matrix 


(= ) 
9Xj J pat 2p KefHlQyeuM 


is less than K. 
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Theorem 4.2.7 (Sard) LetU © R”™ and V C RK be opensets. IfH :U + Vis 
act function, where k > M/K, then the set of critical values of H has Lebesgue 
K-dimensional measure zero. 


The reader can find a proof of Sard’s theorem for the case in which H is ce 
in Krantz and Parks [KP 99; Section 5.1]. A more general form of the theorem 
together with its proof can be found in Federer [Fe 69; Section 3.4]. 

In our present context of numerical homotopy methods, we have the following 
result. 


Theorem 4.2.8 Let U be a bounded open subset of R" . Suppose that H is C 2 in 
an open set containing [0, 1] x U, that the function Fo given by 


Fo(x) = H(0, x) 
has a unique zero xo in U, and that H(t, x) # O holds for all (t, x) € (0, 1] x aU. 


If the matrix 
0H; 
(Smo. 10)) 
Ox; jah 


is nonsingular then, for almost every choice of c in a sufficiently small neighbor- 
hood of 0, there are continuously differentiable functions t(s) and x(s), defined 
for0 <s <1, witht(O) = 0, t(1) = 1, and such that H(t(s), x(s)) = c holds. 
In particular, we have 


F(x(1)) = A(t(1), x(1)) =. 


Proof. We claim that, for all choices of c in a sufficiently smal] neighborhood of 
Oe RN, there isa unique x in U with Fo(x) = c. Because D, Fo is nonsingular at 
xo, there is a neighborhood V of xo that maps one-to-one and onto a neighborhood 
W of 0 € R%. So, for c sufficiently near to 0, there exists at least one x € V with 
Fo(x) = c. If there existed another distinct x’ with Fo(x’) = c, then we would 
have x’ ¢ V Considering a sequence of c’s that converge to 0, but for which 
distinct x and x’ exist with Fo(x) = Fo(x’) = c, we would conclude that the 
sequence of x’s contained in V converges to xo, while the other sequence of x’s, 
or at least a subsequence of them, would converge to another distinct zero of Fo. 
This is a contradiction. 

A similar argument shows that, for c in a sufficiently small neighborhood of 
O€ RY, H(t, x) #c holds for all (t, x) € [0, 1] x OU. 

Now, by Sard’s Theorem 4.2.7, the set of critical values c € R™ is of Lebesgue 
N-dimensional measure zero. Therefore, for almost every choice of c near enough 
to 0, there is a unique x, with Fo(x-) = c, the matrix 


OH; 
re (0, xo) 
a i,j=1.2...N 


is nonsingular, and there is no point in (0, 1] x 8U at which H equals c. The result 
now follows by applying Corollary 4.2.5 to H(t, x) —c. O 
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We have touched only briefly on the topic of numerical homotopy methods; 
there is a substantial literature on the subject and its applications. We direct the 
reader to Allgower and Georg [AG 90] for a fuller exposition and additional ref- 
erences. 


4.3. Equivalent Definitions of a Smooth Surface 


In studying geometric analysis, one is often interested in considering a smooth 
surface in Euclidean space. We may all believe that we understand intuitively 
what the term “smooth surface in Euclidean space”’ means, but, of course, a pre- 
cise definition must be given. In fact, there are many acceptable definitions for 
“smooth surface in Euclidean space.” In this section, we will consider five equiv- 
alent definitions that correspond to the following five heuristic descriptions of a 
smooth surface: 


e The surface can be smoothly straightened. 

e The surface solves a system of smooth equations. 
e The surface can be smoothly parametrized. 

e The surface has smooth local coordinates. 

e The surface is the graph of a smooth function. 


The following theorem gives a precise expression to each of these heuristic de- 
scriptions. Of course, the point of the theorem is that the five possible definitions 
are equivalent, so any one of the five can be used to define “smooth surface in 
Euclidean space.” To distinguish the precise from the intuitive, we then introduce 
the technical term regularly imbedded C* submanifold of R® in Definition 4.3.2. 


Theorem 4.3.1 Let M, N be integers with 1 < M < N. Let k be equal either 
to an integer greater than or equal to | or to +00. Let S be a subset of R” The 
following are equivalent: 


(1) [The surface can be smoothly straightened.] For each point p € S there 
exist a neighborhood U C RY of p, a Ck diffeomorphism ¢ : U > RN, 
and an M dimensional linear subspace L C RN such that 


@(SNU)=LNG(U). (4.15) 


(2) [The surface solves a system of smooth equations.] For each point p € S 
there exist a neighborhood U © R” of p anda ck function f :U > 
RN—" such that, 


snu = f7'@), (4.16) 
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where g = f(p), and 
rank Df(x) = N—M (4.17) 
holds for all x € U. 


(3) [The surface can be smoothly parametrized.] For each p € S there exist an 
open V CR" anda C* function g : V > R” such that g maps each open 
subset of V onto a relatively open subset of S, p is in the image of g, and 


rank Dge(x) = M (4.18) 
holds for allx € V. 


(4) [The surface has smooth local coordinates.] For each point p € S there 
exist a neighborhood U C RN of p, a convex open W C R™, and C§ 
functions 6: U + W, vw: W — U such that 


SNU=W(W) (4.19) 
and ¢ 0 W is the identity map. 


(5) [The surface is the graph of a smooth function.] For each point p € S there 
exist a neighborhood V C RN of p such that SM V is the graph of 
a C¥ function. More precisely, there exists a permutation of coordinates 
®: RN — RY, an open setu C R™, anda C* function F :U — RN-™, 
with DF(u) of rank M forall u €U, such that 


snv= o({(u F(u)):u EUN R"}) (4.20) 


Definition 4.3.2 If S satisfies any or, equivalently, all of the conditions of Theo- 
rem 4.3.1, then we say that S is an M-dimensional, regularly imbedded C* sub- 
manifold of RN 


Proof of Theorem 4.3.1. The implicit function theorem and the inverse function 
theorem are the main tools used in showing the equivalence of the statements. 


(1) => (2) Let U, ¢, and L be as in (1). Choose an orthonormal set of N — M 
vectors U}, U2,...,UN—m, all orthogonal to L. We define f : U — R‘-™ by 
setting 


FOR) = (01 6G), v2 6)... ve 600). 
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Clearly, the function f is C*. We see from (4.15) that f(p) = Oand SNU = 
f—'(0). Because ¢ is a diffeomorphism, the rank of Df must be N — M every- 
where in U. 


(2) = (3) Let U, f, and g be as in (2). Since rank Df (x) = N — M holds for 
all x € U, wecan select indices 1 < ky < kp <--- <ky—y < N So that 


of; afi of; 
OXx, OXK, OXky— 
df2 dfe df2 
det OXg, OXk, on OXkn_M #0 (4.21) 
Ofn-m 9fn—m ofn—m 
OXk, OXk, “"" Oxp Par 


holds at p. 
Let 1 < j) < jo <--- < jy < N be the complementary indices so that 


(ji, Ja,---> Ju) U (ki, ka, .--,kAn—-m} = (1,2, ..., N}- 
Set 
§| = Xs &2 = Xjy,--- EM =Xjys 
TH) = Xkyo 12 = Xkgo--- » NN—M = Xky_m> 


and 


f (Et, €2,-- <5 Ems ms MDs 2s NN—-M) = f(X1, x25 --- XN) —G- 


Then we can apply the implicit function theorem to the function fat the point 
(Pjy> Pjzs- +++ Pj Pky» Pkg» -++» Pky—m) to conclude that there exist a neighbor- 
hood V C R™ of (pj,, Pjos---» Pig) and a C* function 3: V +> RN-™ go 
that a 

f &, BE)) =0 


holds for all € € V. Defining g : V > R by setting 
Si (€) = 81, 8j.() = §2,---+ 8jy (6) = Emu 
and _ 
Bk, (E) = B1(E), 8k (E) = 22(E), -- +s Sknm (E) = Bn—m (E), 
we see that the conditions in (3) hold. 


(3) => (4) Let VC R™ and g: V > RN beas in (3). Let v € V be such that 
g(v) = p. Since rank Dg(x) = M holds for all x € V, we can choose indices 
1 <i) <i2 <--- <iy < WN so that 
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988i, 898i, 988i 
Ox} ax. —s« 
98 88ix 8in 
det} 9x, ax. ~~ ° Oxy | #9 (4.22) 
O8in 98in O8in 
Ox} @x2 —ti(‘(‘éi 


holds at v. 
By the inverse function theorem, there exists an open set W with vy € W CV 
on which the map ¢ defined by setting 


g(x) = (8i,(%), Bin (X), «~~» Sing (X)) 


is one-to-one and has a C* inverse. Without loss of generality, we may assume 
that W is convex. 
Set 
U= {(i, Ua, . 26, UN) 2 (Ujys Mir ees Mig) E e(W)} ; 


and define ¢ : U — W by setting 
P(uty, lo,..-,uN) = a (ui, lias eng tips Ve 
We see that the conditions of (4) hold with Y = gly. 


(4) => (5) Let U, W, ¢, and w be as in (4). Since ¢ 0 w is the identity, Dy 
must be of rank M at each point of W, in particular, atg = ¢(p). Thus we can 
choose indices 1 < i) <i2 <--- < iy < N so that 


OVi, IWi, OWi 
dw) dw. ti(ité 
OW, OW, tad OWi, 
det} dw; dw. ~" dwy | #0 (4.23) 
IWin, OWiny OWin 
dw, Ow.  £dwy 


holds at q. 
Let 1 < jf < jo <--- < jn-y <N be the complementary indices so that 
{11,i2...-,tm}U (jy, jos---. in—m)} = {1205.4 NN}, 


Define the orthogonal transformation V : RY —> R* by setting 


V(x. 42, ---.4N) = (04), Xie wea Xing XjpsXjyr- e+ Xjy_y)- 


Let 1) : RY x RN-™ _, R”™ and M2 : RY x RN-M _, RN-™ be the 
projections onto the first and second factors, respectively. By (4.23), we can apply 
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the inverse function theorem to Il; o V o w at the point wg = Tl] oo W(q) to 
see that 


f=(™ ovo). :U CR” -, RM 
exists, is C*, and Df has rank M at each point u € WU. Setting 
F=TMnoWowof, o=V-!, 
and taking U/ to be a possibly smaller neighborhood of 19, we obtain (4.20). 


(5) => (1) Let V. &, and U/ be as in (5). Let M1) : R”@ x RY-™” -, R™ and 
Mz: R” x RY—™ —, RN be the projections onto the first and second factors, 
respectively. Let ug € U be such that p = (uo, F(uo)), and let L be the image 
under © of the tangent plane to the graph of F at (uo, F(uo)). 

We define H = Tl, 0 @—!' and V = [20 ©! — Fo H. Then, defining 
@: RN — RY" by setting” 


(x) = O(H (x), V(x) + Fo) + (DF (uo), My 0 O"(x) — uo)), 
we see that (4.15) holds for a small enough neighborhood of p. O 


We close with a remark about the significance of the phrase “regularly imbed- 
ded.” In the geometric theory of manifolds, one learns that a manifold is a geomet- 
ric object that is locally equivalent to Euclidean space—the equivalence provided 
by a smooth mapping. The definition of manifold is independent of any imbed- 
ding. And it is entirely possible to map a smooth manifold into Euclidean space 
in a nonsmooth way. For example, the unit interval (which is certainly a smooth 
manifold) can be mapped into the plane in a continuous and one-to-one manner 
so that the image has positive area (see Osgood [Os 03]). Such a curve is not even 
rectifiable, hence is certainly not smooth. 

When a smooth mapping of a manifold into Euclidean space is called an “imbed- 
ding,” it is always required that the mapping be one-to-one, so self-intersections 
are not allowed. Even in the absence of self-intersections, it may or may not be 
true that the following property holds: 


The mapping is a homeomorphism onto its image, when the image 
has the relative topology. 


Some authors require this stronger homeomorphism property of every mapping 
that they call an imbedding. In our lexicon, we use the term “regular imbedding” 
when demanding the homeomorphism property. As we see from Theorem 4.3.1, a 
regularly imbedded manifold sits in space in a smooth and geometrically natural 
manner, much as the line {(x, y) : y = 0} sits in the plane R?. 


2Recall from Section 3.3 that we uSe the notation (, ) to denote the application of the Jacobian 
matrix to a veCtor. 
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4.4 Smoothness of the Distance Function 


In the study of various problems of analysis that may be set in a bounded domain 
in Euclidean space, it is often useful, or even necessary, to consider the distance 
from a point to the boundary of the domain. Of course one can also consider the 
distance to the boundary of a domain when measured from a point outside the 
domain. Assigning a positive sign to the distance to the boundary from the inside 
and a negative sign to the distance to the boundary from the outside, one is led to 
the idea of the signed distance function to a surface. 

In what follows we will not always refer to the inside and outside of a domain 
because we can equivalently use a choice of unit normal vector to orient a surface. 

It may be noted that, in general, the ordinary distance will not be a smooth 
function. For example, let S be a line in R?. Then the function 


o(x) = inf{|s — x] : x € S} 


is not even C!. One of the main motivations for considering the signed distance 
function is that it is maximally smooth under minimal hypotheses. 

Let us assume for purposes of a motivating discussion that the surface is smooth 
and that we are investigating the signed distance function in a region near enough 
to the surface that, for each point in the region, there is a unique nearest point of 
the surface.? The most straightforward way to measure the distance from a point 
p to the surface is to locate the nearest point €(p) on the surface and compute 
the length of the vector from the nearest point to the original point. To get the 
signed distance function, one uses the fact that the vector from &(p) to p must 
be perpendicular to the surface at &(p). To take advantage of this geometry, one 
chooses a continuous unit normal field on the surface and computes the inner 
product of that unit normal at that nearest point €(p) with the vector from the 
nearest point to the original point. 

In this process of finding the nearest point on the surface and taking the in- 
ner product with the unit normal vector, it seems that at least one differentiation 
has taken place and that the signed distance function would be expected to be 
one order of differentiability less smooth than the surface. In fact, the signed 
distance function is just as smooth as the original surface. The key to proving 
this is in making efficient use of the implicit function theorem or of the inverse 
function theorem. As far as we know, this result first appeared* in Gilbarg and 
Trudinger [GT 77; Lemma 1, page 382]. Other proofs can be found in Krantz 
and Parks [KP 81], Foote [Fo 84], and Krantz and Parks [KP 99; Theorem 1.2.6, 


31n fact, this is a nontrivial hypothesis. A surface with this property is called a “set of positive 
reach.” Jt is known, and we will prove below using the implicit function theorem, that a C~ surface is 
a set of positive reach. 

4On page 50 of Hérmander [Ho 66] it is observed in passing that the implicit function theorem can 
be used to show the distance function is as smooth as the boundary. 

5In the second edition, Gilbarg and Trudinger [GT 83], the result appears as Lemma 14.16, 
page 355. 
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page 12]. Lars Hormander was good enough to describe to us a proof using the 
calculus of variations [Ho 00]. 


General Facts about the Distance Function 


In the proof of the next proposition we show that, for an arbitrary closed set, 
it follows from the triangle inequality that the distance function is a Lipschitz 
function with Lipschitz constant 1. 


Proposition 4.4.1 Let S be a closed subset of RN Then for any x,y € RN it 
holds that 


| dist(x, S) — dist(y, S)| < |x — y]. 
Proof. Let x, y € R% be arbitrary points and let v € S be such that 
dist(y, S) = ]y —v| 
holds. Then we have 
dist(x, S) < |x —v] < |x —yl+ly—vl, 


J @) 
dist(x, S) — dist(y, S) < |x — y] 


holds. Similarly, we have 
dist(y, S) — dist(x, S) < |x — y], 


proving the result. O 


To learn more about the distance function we will need to investigate the be- 
havior (on its domain) of the unique nearest point function. First, we give the 
definition. 


Definition 4.4.2 Let S be a closed subset of R™ and denote by &(S) the set of 
x € R™ such that 


& € S, & € S, and |x — &| = |x — &2| = dist(x, S) imply & = é. 
Define the nearest point function € : &(S) — S by requiring 
E(x) € S and |x —&(x)| = dist(x, S), 


for x € &(S). Of course, &(S) is exactly the set on which the nearest point 
function & is well defined. 


In the proof of the next lemma we will see that it is elementary to show that, for 
an arbitrary closed set, the unique nearest point is continuous where it is defined. 


Lemma 4.4.3 For any closed S C R%, the function & : E(S) — S is continuous. 
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Proof. Arguing by contradiction, suppose that x,, x2,... is a Sequence in &(S) 
that converges to x € &(S), but that there is € > O such that |&(x;) — &(x)] > € 
holds for alli = 1,2,.... 

It is clear that (x1), &(x2),... 1s a bounded sequence so, by passing to a sub- 
sequence if necessary, but without changing notation, we may assume that &(x;) 


converges to some point < € S. It is also clear that |z — x] = dist(x, S), so by 
the definition of &(S) we must have &(x) = z. contradicting the assumption that 
&(x;) stays at least a distance of € away from &(x). O 


Even for a completely arbitrary subset of Euclidean space, the directional deriva- 
tives of the distance function can often be shown to behave well, where they exist. 
The next two lemmas appear in Federer [Fe 59]. 


Lemma 4.4.4 For any closed S C RN, let the function @ : RN — R be defined 
by setting 
oe(x) = dist(x, S). 


If x € &(S) \ S and if the directional derivative of @ in the direction of the unit 
vector v exists, then that directional derivative equals 


x — &(x) 
0°... 


4.24 
Q(x) ep 


Proof. Fix x € &(S) \ S and write simply & for (x). Also write r = |x — &|. 
Fix the unit vector v and suppose that the directional derivative 


lim Q(x + tv) — E(x) 
t—0 t 


exists. 
Since € € S, we have 


Q(x+tv)<|x+tv—-—€é| = V(x +tv—&) (x +tu—&) 


y (x — &)- (x — €) 421v- (x —€) 420-0 
= yret2rv-(x—&) 422 


So we have 


im oo tt) = Q(x) 


li lim Vr? 42tv-(x -—&) + -—47 


ro t ~ £40 t 

= dim 2u (x—&)+t 
NO Jr242ty (x -é) 4247 
uv - (x —&) 


oe ee ee ee 
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= e(x + tv) —o(x) 


Vr7+2tv-(x —&) +87 —4r 


li > lim 
t0 t to t 
_ v (x6) 
= ree 
and the result follows. O 


Remark 4.4.5 By Lemma 4.4.3, the expression in (4.24) is continuous on E(S) \ 
S. This observation will be used in the proof of the next theorem to conclude that 
the distance function is continuously differentiable on the interior of E(S) \ S. If 
the closed set S under consideration is actually a C! submanifold, then one can 
make a direct argument based on examining the difference quotient to conclude 
that (4.24) gives the directional derivative at all points in the interior of &(S) \ S. 
In this way the less elementary argument used in the proof of the next theorem 
can be avoided for a C! submanifold (see Foote [Fo 84; Theorem 2]). 


Theorem 4.4.6 For any closed S C R%, the function @ : RN — R defined by 
Setting 
Q(x) = dist(x, S) 


is continuously differentiable on the interior of =(S) \ S. 


Proof. Consider a point in the interior of &(S) \ S and the line L through that 
point and parallel to the i" coordinate axis. Since by Proposition 4.4.1 we have 


Jo(x1) — e(x2)| < [x1 — x2 


for any pair of points x, x2 € RY, it follows that @ is absolutely continuous on 
L and thus that o is the integral of its derivative along L. But the derivative of @ 
along L is the i" partial derivative of @ and, by Lemma 4.4.4, where it exists it 
equals the continuous function 


x —&(x) 
Q(x) 


i ° 


Here e; denotes the i" standard basis vector for RY. Consequently, @ restricted 
to L intersected with the interior of [E(S) \ S] is the integral of a continuous 
function, and thus is continuously differentiable on L. 

It follows that all the partial derivatives of @ exist and are continuous on the 
interior of [=(S) \ S], so @ is continuously differentiable on that set. O 


The Distance to a Submanifold 


Next we present a useful construction which, for a smooth submanifold of R’, 
gives a smooth parametrization of a tubular neighborhood about M near any 
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point. The ideas here go back to Hotelling [Ho 39] and the seminal paper of 
Weyl [We 39]. For a thorough treatment of tubes, the reader should see Gray 
[Gr 90]. 


Lemma 4.4.7 (Local Tubular Neighborhood Lemma) Let M Cc R% be a com- 
pact C* submanifold of dimension K, where we assume k > 2. and N—1> K > 
1. Then, for each point P in M, there exist a neighborhood W of P in M, a ck 
function F : W' — RY, where W' Cc RK is open, and an orthonormal set of 
ck-1 vector fields Vj; : W' > RN j=1,2,...,N—K, such that 


(1) F(W’) = W, 
(2) rank (DF) = K at every point of W’, 


(3) Vj(x) is normal to M at F(x) for each x € W’ and for each j = 1,2,..., 
N—-K, 


(4) d: W’ x RN-* _, RN defined by 
N-K 


D(x, y) = F(x) + > yj Vj(x), (4.25) 
j=l 


for x € W' and y = (y1, y2,---, yn—K) € RN-K, is a Ck! function, 
is one-to-one on {x} x RN-* for each x € W’', and is one-to-one in a 
neighborhood of W' x {O}. 


Proof. Because M C RN" is a C* submanifold, we can choose a neighborhood 
W of P in M that is parametrized by 


F:W'cR* - RN 


where rank (DF) = K at every point of W’ Thus (1) and (2) are satisfied. 
By taking a smaller neighborhood than W’ if necessary, but without changing 
notation, we can arrange that the same K rows, 


bist, aseel Ks (4.26) 
of the N x K matrix 
OF; 
— (4.27) 
Oxj] i=l,....N 
J=1,....K 


are independent at every point of W’. Now, because the same rows of the matrix 
(4.27) are independent at every point of W’, we will be able to define a set of 
C*—! vector fields 


qT (x), T2(x), . eR (x), Vie), V2(x), ty Vu—x (x), 
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for x € W’, that form an orthonormal basis for R for each choice of x € W’ 
We may further suppose that 


T(x), Ta(x),..-, Tk (x), 
are all tangent to M at F(x), while 
Vi (x), V2(x),---. Vn—K (x) 


are all normal to M at F(x). One way to do this is to apply the Gram—Schmidt 
orthogonalization procedure to the set of vectors consisting of 


Li Os 
8x; »_J= yeoerys ® 


and the standard basis vectors 


Cir Cie Cit, ge 
ie ij, i,---,dy_p are the indices not occurring in (4.26). Thus, (3) is satis- 
ed. 

Because of (1) and (3), it is clear that ® defined by (4.25) is a c*-! function. 
Observe that D(x, 0) is represented by the matrix that has as its first K columns 
the independent columns of DF (x) that are tangent to M at F(x), and that has as 
its last N — K columns the vectors V; (x), V2(x),..-. Vv—x (x) that are indepen- 
dent vectors normal to M at F(x); thus we see that D®(x, 0) is nonsingular for 
each x € W’. Finally, we apply the inverse function theorem to conclude that © 
is One-to-one in a neighborhood of W’ x {0}, as claimed in (4). O 


The following definition was introduced in Federer [Fe 59). It will allow us to 
generalize the preceding result. 


Definition 4.4.8 A set S C R% is said to be of positive reach if there exists r > 0 
such that, 


for all x € R%, dist(x, S) <r implies there is a (4.28) 
unique point € € S with |x — &| = dist(x, S). : 
In case S is of positive reach, we define the reach of S to be the supremum of all 
r > O for which the condition (4.28) is true. 


If one examines the proof of Lemma 4.4.7, one will note that the only place 
in which we used the fact that / was at least C2 was to apply the inverse func- 
tion theorem to conclude that ® is one-to-one in a neighborhood of W’ x {0}. By 
adding the hypothesis that M is of positive reach, we can obtain the same conclu- 
sion in the C! case. These remarks give us the following corollary of the proof of 
Lemma 4.4.7. 
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Corollary 4.4.9 [fM Cc R is acompact C! submanifold of dimension K, where 
we assume N—1 > K > 1, and if M has positive reach, then for each point P in 
M there exist a neighborhood W of P in M,a C* function F : W' > RY, where 
W’ C RK is open, and an orthonormal set of C*-! vector fields Vj : W' > R’, 
j =1,2,...,N —K, such that 


(1) F(W'!)=W, 
(2) rank (DF) = K at every point of w’ 


(3) Vj(x) is normal to M at F(x) for each x € W’ and for each j =1,2,..., 
N —-K, 


(4) d: W' x RY-K — RN defined by 
N-K 


O(x,y) = F(x) + D> yj Vix), (4.29) 
j=) 


forx € W' and y = (y1, y2,--»» YN—K) € RY-K) is a CK! function, 
is one-to-one on {x} x R-* for each x € W', and is one-to-one in a 
neighborhood of W' x {0}. 


Theorem 4.4.10 If M Cc RY is a compact C* submanifold of dimension K, 
where k > 2 and N—1 > K > 1, then M has positive reach and there is a 
neighborhood U of M on which § : U — M is ck! 


Proof. Suppose that M is not of positive reach. Then there exist sequences of 
points X}, X2,... inR%, Pj, Po,... in M,and Qi, Q2,... in M with 


O= lim dist(X;, M), (4.30) 
I—>0O 
and such that, for eachi = 1,2,..., 
P; # Q; and |X; — P;| = |X; — Qi] = dist(X;, M) (4.31) 


hold. By the compactness of M, we can pass to a subsequence, but without chang- 
ing notation, so that 


lim X; = lim P; = lim Q; = Pe M. (4.32) 
Ii i CE i—00 


We apply Lemma 4.4.7 at P to obtain the neighborhood W of P in M, the func- 
tion F : W’ — RN, and the function ® : W’ x RN-* — RY. Letting xo € W’ 
be such that F(x9) = P, we can use Lemma 4.4.6(4) to choose a neighborhood 
Up of P in RN so that ® is one-to-one on Up. 

By (4.32), for all large enough i, we have P;, Q;, X; € Up contradicting the 
fact that ® is one-to-one there. Thus M is of positive reach. 

Note that on Up the nearest point retraction is F o Tl o &~!, where T1 is pro- 
jection onto the first factor. Therefore & is a C‘—! function on Up. Since the same 
reasoning can be applied at any point P € M, we see that & is C*—! ina neigh- 
borhood of M. O 
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Corollary 4.4.11 (Foote [Fo 84]) If M C RN is a compact C* submanifold of 
dimension K, where k > 2 and N —1> K > 1, then there is a neighborhood U 
of M on which the distance function 

e(x) = dist(x, M) 
isC* onU\ M. 
Proof. By Theorem 4.4.6, the distance function is continuously differentiable on 
the interior of =(M) \ M. Any directional derivative of the distance function is 


given by Equation (4.24) in Lemma 4.4.3, but by Theorem 4.4.10, that directional 
derivative is a C’—! function on U \ M. Thus, @ itself is C*. O 


The following example shows us that, for a surface that is less smooth than C2, 
everything can go wrong: The surface need not be of positive reach and the dis- 
tance function can fail to be differentiable at points arbitrarily near to the surface. 


Example 4.4.12 Let 0 < € < 1 be fixed. Forma simple closed curve y in R? by 
smoothly connecting the endpoints of the graph of 


y= |x|*-€, —] <x< l. (4.33) 


Consider 0 < b small enough that any point (xo, yo) such that dist[(0, b), y] = 
\(0, b) — (xo, yo)| is a point on the graph (4.33). Notice that for small |x| > Oit 
holds that 

Ix|€ + [x|P-* < 2b, 


sO 
Ix-* (elf + bP) + 6? < xP? + 8? 


Thus, for any such small |x] > 0, we have 
Ix]? + (IxP-*) —2Ix|?-€b +b? < BP 


or 


(0. b) — (x. Ix-*)| < 1(0, 6) — 0, 0). (4.34) 


Because of (4.34), we conclude that dist[(0, b), y] < |(0, b) — (0, 0)| and by the 
symmetry of the graph (4.33) there must be at least two points (xo, Ixol-~¢) and 
(—xo, |xo|~), where x9 > 0, for which 


dist{(0, b), y] = |(0, 6) — (x0, Ixol?-*)| = |(0. 6) — (—20, Ix0l-*) 
holds. 

Now, at any point such as (0, b) above for which there are two or more distinct 
nearest points, the distance function must fail to be differentiable. This is so be- 
cause a function that is differentiable at a point and for which the gradient does 
not vanish has a well-defined direction of most rapid decrease, but when there are 
two distinct nearest points, there are two corresponding directions of most rapid 
decrease for the distance function. 
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In fact, the situation is even worse in this example. By taking € = 1/2 for 
simplicity and taking b small enough as above, one can compute (letting 6 denote 
distance to the curve) that 


lim é[(h, b)] — 5[(0, 5)] 4 lim é[(h, by] — 8{(0, 5] 


ho h hto h 


so 06/8x does not exist at (0, b). O 


The Generalized Distance Function 


In our results concerning the smoothness of the distance function, we have so 
far rather conspicuously avoided the points of M itself. In fact, the (unsigned) 
distance function, without some modification, must fail to be differentiable at all 
points of M. The surprising thing is that in the cases K = N — land K = 1, it 
is possible to modify the distance function @ so as to make the resulting function 
@ still be as smooth as the manifold M, but on an entire neighborhood of M. To 
achieve this end, we need an improved form of the Local Tubular Neighborhood 
Lemma 4.4.7. As far as we know, the key part of this improved result, (2), can 
only be obtained for curves, K = 1, and hypersurfaces, K = N — 1, and not for 
the intervening dimensions, 1 < K < N — 1, of the submanifold. 


Lemma 4.4.13 if M Cc R% is a compact C* submanifold of dimension K equal 
to either 1 or N — 1, where k => 2 and N > 2, then for each point P in M 
there exist a neighborhood W of P in M, a C* function F : W' + R", where 
W' CR is open, and an orthonormal set of Ck—! vector fields Vj :W' > RN, 
j =1,2,...,N—K, such that 


(1) Vj (x) is normal to M at F(x) for each x € W' and for each j =1,2,..., 
N-K, 


(2) for each P € M and for each j = 1,2,..., N — K, the directional deriva- 
tive of Vj in any direction tangent to M is again tangent to M, 


(3) &: W’ x RY-* _, RN defined by 


N-K 
P(x,y) =x+ >. yj Vix), (4.35) 
j=! 


forx € W' and y = (y1, y2,---, yn-K) € RN-*, is a CK! function, 
is one-to-one on {x} x RN—! for each x € W’, and is one-to-one in a 
neighborhood of W' x {0}. 


Proof. Proceed as in the proof of Lemma 4.4.7 to obtain W, F, and the orthonor- 
mal vector fields Vj : W! > RY, j =1,2,..., N — K. Suppose without loss of 
generality that 0 € W’ and F(0) = P 


MX we ee Fn Baw af BML Ee owe F 


4.4 Smoothness of the Distance Function 87 


In case K = N — 1, the condition (2) follows from the fact that Vy - Vw = 
1, more precisely, differentiation of both sides of the equation Vy Vy = 1 in 
a tangent direction implies that the derivative of Vy in that direction must be 
orthogonal to Vy, and hence that directional derivative of Vy is tangent to M. 

Now, assume that K = 1. We apply the Fundamental Existence Theorem for 
ordinary differential equations to find a set of functions 


Pll»P12 ---+ GYIN-I, 
P21, 922, +--+, Q2Nn-1, 


PN-11,;PN-125 -+- QYN-1N-1; 
satisfying 
_, N=! 
vy + (14973) Yioyeete 
ry 
N-! 


t (1 ba ii) Dy (d:¢ + $j c) (jm + ejm)Ve -Vn =0, (4.36) 


€.m=!) 
gy; (0) = 0, (4.37) 


fori = 1,2,...,N —1 and j = 1,2,...,N — 1. Here 6;; is the standard 
Kronecker delta. Noting that the first-order differential equations in (4.36) involve 
data that are C’—*, we conclude that the solutions ¢; ; are C*—'. Replacing W and 
W’ by smaller neighborhoods if necessary, but without changing notation, we may 
assume the functions ¢; ; are defined and satisfy (4.36) on all of W. 


Set 
N-1 


Vicz) = D> (Se + viel) Vela). 
e=1 
The vector fields V; are C*—!, because the yj ¢ and the Ve are C'—!_ By (4.36), we 
have V;(0) = V;(0), so at x = O the system of vectors V; (0), V2(0), ..., Vw—1 (0) 
is orthonormal. We compute 


N-1 


VY; (3 (dc +92) ¥) ( (8jm+ vin) Yn] 


c=! m=! 
(5; m+ Qj m) vn) 


N-1 


N-1 N-1 
( sever (b+ mW) (x 
e=1 


c=) m=! 


N-1 N-1 
= > P; Ve (¥ z > Pjm vn) 
e=1 


m=! 
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N-1 


a > (ae + ie) (5jm + %jm)Ve Vin 
€ym=1 
N-1 
= (14 9;;)¢4; + QjeVie 
oH 
N-1 
9 > (de + gie)(8jm + ojm)Ve- Vm 
eym=1 
= 0 
It follows that Vi (x), V>(x), gous Vy-1 (x) is an orthonormal system for all x € 
W’ and that (2) holds as required. 
The remainder of the proof proceeds as for Lemma 4.4.7. oO 


Definition 4.4.14 Suppose M C RN" is acompact C* submanifold of dimension 
either 1 or N — 1, where k > 2 and N > 2. Let the C*—! function ® : W’ x 
R'-K _, R be as in Lemma 4.4,13(3). By the inverse function theorem, there 
is a neighborhood U of W on which @ is invertible. Define 6 : U > R-* by 
setting 


6(Q) =M12067'(Q), (4.38) 


for Q € U, where M12 : W’ x RN-* ~-, RN-K is projection onto the second 
factor. 


Theorem 4.4.15 [f M C RW is acompact C* submanifold of dimension K equal 
to either 1 or N — 1, where k > 2 and N > 2, then 6 defined in (4.38) is C* and 
satisfies 


le(Q)| = e(Q), (4.39) 
for Q € U, where U is as in Definition 4.4.14. 
Remark 4.4.16 


(1) Incase K = N — 1, the function 6 defined in (4.38) is called the signed 
distance function. Incase K = 1, we canconsider 6 to be the generalization 
of the signed distance function, as is justified by (4.39). 


(2) If the function ® from Lemma 4.4.7 were used in Definition 4.4.14, instead 
of ® from Lemma 4.4.13, then equation (4.39) would still hold, but our 
proof that 6 is C* would no longer be valid. 


Proof of Theorem 4.4.14. For each j = 1,2,...,N — K, define ej:U—->R 
by setting 


a(x) =|[x—€@)] VEC. 
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Using I to denote the identity map on R”, we compute® 


(Daj) Velg@d]) = (1- DEO, HelgooI) - Freon! 


+ bx — &(x)]- (D(H; 0 £90), HelEO)) 


= VelE@)1- VilE@)) (4.40) 
~(Dé (x), Helg(a}} - Hilo] (4.41) 


+ x -£@)] (DV EGO] (DE), HECOI). 4.42) 
We examine the terms (4.40)-(4.42) in turn. For (4.40), we have 
VelE(x)] VilE(x)] = je. 


Since &(x) is always in M, D&(x) applied to anything must be a tangent vector. 
Thus, for (4.41), we have 


(DEO, MelE(x]) - Filo] = 0. 


But even more is true. When we move in a normal direction, the nearest point 
does not change, so we have 


(Decx), Velgcay)) = 0. 


Thus we have 


(Dijtee1, (DEC), FetECI)) = 0, 


so (4.42) vanishes. 
Combining the values for (4.40)-(4.42), we obtain 


(Daj(x), VeleCa]) = bye. (4.43) 
Next we consider any tangent direction 7 We compute 


(Daj(x), TEEN) = (I- Dg@). TED) - HilEC 
+ [x — £0)] -(D(Fj o£)(x), TIE CDI} 
= T(E) - VilEO)) (4.44) 
—(Dé (x), TEG))) - HEC (4.45) 


+ x —&)]- (DULG), (DEG), TEC). (4.46) 


6Recall from Section 3.3 that we use the notation (, ) to denote the application of the Jacobian 
matrix to a vector. 
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We see that the terms (4.44)—(4.46) vanish as follows: 

(4.44) This term is trivially zero as it is the inner product of a unit tangent vector 
and a normal vector. 

(4.45) Sinceé = F ol, 0 &~!, the image of D& must lie in the image of DF, 
that is, it must lie in the tangent space. Thus this term is also the inner product of 
a tangent vector and a normal vector. 

(4.46) This term vanishes because x — &(x) is a normal vector and 


(DVjtEC)). (DE), TIEC}} 


is a tangent vector, as guaranteed by Lemma 4.4.13(2). 
Combining the values for (4.44)-(4.46), we obtain 


(De jx), T[g(x))) =¢. (4.47) 


Let T;, T2,.-., Tx be anorthonormal set of ck tangent vector fields. From (4.43) 
and (4.47), we know how to represent D@ in terms of the basis 


{T1, Toy. Tks Vi, Vas. Awa} (4.48) 


for R’. Specifically, in terms of the basis (4.48), we have 


( Oxxk Oxx(N-K) ) 


4.49 
Ow-K)xK [(N-K)x(N—K) Ge2 


where O and J denote the zero and identity matrices of the indicated sizes. As 
the final step, we change basis in R‘ from (4.48) to the standard basis. Since this 
change of basis is a C*—! operation, we see that Dé is C*—', and thus 6 is C* O 


One may ask what happens in case M is only C!. It no longer makes any sense 
to demand that the tangential derivatives of the normal vectors be tangent, because 
such derivatives need not exist. Nonetheless, when K = N — 1, the definition of 
@ does not really require Lemma 4.4.13; in fact, Corollary 4.4.9 is sufficient. We 
have the following result 


Theorem 4.4.17 If M C R% is a compact C! submanifold of dimension N — 1 
and if M has positive reach, then @ is C' in a neighborhood of M. 


Proof. Let V be the unit normal field chosen so that 
G(x) = (x -&()) - VE). 


By Lemma 4.4.4 and Theorem 4.4.6, and the fact that 


@(x) = (x es g(x) VIE] = sign [(x = g(x)) vie] | Q(x), 
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we see that, as long as x is in the interior of =(M) \ M, then 


(Dac), VIECI) = 1. 
Similarly, 
(Da(x), T) = 0. 
holds for any direction that is tangent to M at &(x). Thus 
D@ = VIE(x)] (4.50) 


holds on the interior of &(M) \ M. Repeating the argument used in the proof of 
Theorem 4.4.6, we see that (4.50) extends to the interior of =(M). Since (4.50) 
shows D@ to be continuous, we conclude that 6 is C! in the interior of E(), 
which is a neighborhood of M since M has positive reach. O 


O 


Variations and Generalizations 


5.1 The Weierstrass Preparation Theorem 


In Section 2.2, we described the method Newton devised for examining the local 
behavior of the locus of points satisfying a polynomial equation in two variables. 
It is easy to extend that method to the locus of an equation of the form 


Z”" + am—1(w) 2"! + am—2(w) 2"—? + +++ +ag(w) = 0, (5.1) 


where each a;(w) is a holomorphic function of w € C that vanishes at w = 0. 
Such an extension is significant, because the Weierstrass preparation theorem will 
show us that the behavior near (0, 0) of the locus of an equation of the form given 
in (5.1) is completely representative of the local behavior of the locus of points 
satisfying F(w, z) = 0, where F is holomorphic. 

Suppose F(w, z), (w, z) € C" x C, is holomorphic in a neighborhood of the 
origin, is not identically zero, and satisfies F(0,0) = 0. To study the locus of 
the equation F(w, z) = 0 near the origin, we would apply the implicit function 
theorem if possible, but when the linear term in the Taylor series for F vanishes, 
the use of the implicit function theorem is not possible. Instead, the tool that can 
be used is the Weierstrass preparation theorem. 

As a very simple example, consider the locus of points satisfying 


sin(z” — w) = 0 (5.2) 
near the origin. We can write 


sin(z” —w)= U(2? —w)- (22 — w) 
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with U(&) = sin(&)/& (extended to equal | at the removable singularity € = 0). 
Since U(&) is nonvanishing in a neighborhood of € = 0, there is a neighborhood 
of (0, 0) in which U(z* — w) is nonvanishing and, in that neighborhood, the locus 
of points in C? satisfying (5.2) is identical to the locus of points satisfying 


z—w=0. 
This example worked out so easily because one could see in an obvious way 
how to factor out the nonvanishing function U(z* — w). In the next example, 
the nonvanishing factor is less obvious. The more important aspect of the next 
example is that it illustrates a systematic procedure for constructing U. Following 
Hodrmander [Ho 66], we will use that procedure for the main construction in the 
proof of the Weierstrass preparation theorem. 
Example 5.1.1 Consider the locus of points satisfying 

z2 

1+ 22 


+w=0 (5.3) 


near the origin. 
We will construct a function S(w, z) that is nonvanishing and holomorphic in 
a neighborhood of the origin and that satisfies 


z2 + a;(w)z +ao(w) = ie + w(1+ z*)) S(w, 2), (5.4) 


with aj (w) and ag(w) both holomorphic and with a; (0) = ag(0) = 0. Once those 
functions are constructed, we can set 


U(w, z) = (1+ 27)! Sqw, z)7! 
and multiply both sides of (5.4) by U to see that 
1+ 2? 


holds. We conclude that, in a neighborhood of the origin, the locus of points sat- 
isfying (5.3) is identical to the locus of points satisfying 


U(w, z) (2? + ar(w)z + ao(w)) = 


+w 


z* +a\(w)z+ag(w) = 0. (5.5) 
The functions in (5.4) will be constructed iteratively by setting 
So=1, Ro = 0 
and, fork = 1,2,..., solving 


ae w(1+ z’) Se-1(w, Z) = z? Se(w, Z) + Re(w, z), (5.6) 
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where Ry (w, z) is the remainder after the left-hand side of (5.6) is divided by z’, 
that is, Rx (w, z) is a linear expression in z with coefficients that are holomorphic 
in w and vanish at w = 0. 
One computes 
ze w(1+2z7) So =z — w(1+2z") =7(l—w)-—w 
sO 
5} =l-vw, R; =-w. 


Next, we compute 
ze w(1 +27) S = z7—w(1+z)(1—w) =7(1 —wt+w’*)— wt+w*, 


sO 
Ss =l-—w+u’", Ro = —w + w? 


In general, one checks that 
k i 2 k . e 
Se(wrz) = -Diw), Rew) = P- 1) w . 
j=0 j=! 
Since S, converges as k —> 00 to (1 + w)~! for |w] < 1, we conclude that 
S(w, z) = S(w) =(1+w)7!. 
From (5.4) one can deduce that 
a\(w)z+ao(w) = w/(1+w). 


(It is particular to this example that S$ depends only on w and that a; = 0.) 
Alternatively, one can obtain the same result by using Ry = Dia (—1)/ w/ and 
observing that 


a,(w) z+ ag(w) = — Jim Ry(w) = w/(1+ w). 


We see that, for |w] < 1 and |z| < 1, the locus of points satisfying (5.3) is 
identical to the locus of points satisfying 


2 Ww 
—_- = 0 
Z a tag 


O 


The important feature of the polynomial z? + a\(w)z + ao(w) in the example, 
as well as of the polynomial in (5.1), is that they are monic polynomials in z with 
holomorphic coefficients that vanish at w = 0. This class of polynomials is named 
in the next definition. 
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Definition 5.1.2 A function W(w), w2,..., Wn, Zz), holomorphic in a neighbor- 
hood of 0 € C"t!) is called a Weierstrass polynomial of degree m, if, writing 
w = (W), W2,..., Wy), we have 


W(w, z) = 2" +am—1(w) 2"! +--+ +.a1(w) z+ ao(w), (5.7) 


where each a;(w) is a holomorphic function in a neighborhood of O € C” that 
vanishes at w = 0 € C” 


Theorem 5.1.3 (Weierstrass Preparation Theorem) Let F(w,z), w = (wy, 
W2,..., Wn), be holomorphic in a neighborhood of 0 € C"+! Let m bea pos- 
itive integer. If F(O,z)/z is holomorphic in a neighborhood of 0 € C and is 
non-zero at O, then there exist a Weierstrass polynomial W(w, z) of degree m 
and a function U(w, z) holomorphic and nonvanishing in a neighborhood N of 
0¢€ C"t! such that 

F=UW (5.8) 
holds in N, 


The proof of the Weierstrass Preparation Theorem 5.1.3 requires an estimate 
which we isolate in the following lemma. 


Lemma 5.1.4 Let m be a positive integer and let pj, j = 1,2,...,n +1, be 
positive real numbers. Suppose 


2" (w, z) + w(w, z) 


is holomorphic on 
A= {(w,z) : Jwjl< pj, f=1,2,...,4, Iz] < Png} 


and suppose that W is a polynomial in z of degree less than m with coefficients 
that are holomorphic functions of w. If 


B= a Iz"$(w, z) + W(w, z)| < 00, 
then 
uP ly (w, z)] < mB (5.9) 
and 
uP Id(w, z)] < Gm + 1) B(On41)™ (5.10) 


hold. 
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Proof of Lemma 5.1.4. Setting f(w, z) = z"(w, z) + w(w, z), we have 


m—1 F) 
W(w, =) = — z/ fj. (5.11) 
j=0 


Applying the Cauchy estimates (see Lemma 2.4.3) for each fixed w allows us to 
conclude that 


holds, for j = 0,1,..., — 1, so (5.9) follows from (5.11) and (5.12). 
- Now using (5.9) and selecting r with O < r < p41, for fixed w, we have 


ld(w.z)| < |[2" (wv, z)] r~”"_ < (m+: LY Br™ for |zZ| =r. (5.13) 


By the maximum modulus principle and the choice of r, (5.10) follows from 
(5.13). O 


Proof of Theorem 5.1.3. Expressing F as a power series and collecting terms 
by powers of z, we can write 


Co 
F(w, z) =)- fi(w)z/ ; 


j=0 
Set 
m—!] ; 
G(w,z) = ) > fy(w)z/ 
j=0 
and 
[o¢) 
H(w,z)=z" > fi(w)z, 
j=m 
so that 
F=G+2z"H 


and H is holomorphic and nonvanishing in a neighborhood of 0 € C"*!. 
As in Example 5.1.1, we will construct the function S, holomorphic and non- 
vanishing in a neighborhood of 0 € Ct! so that 


(2"+G/H)S=w (5.14) 
is a Weierstrass polynomial. We can then define U = H/S and the result will 


follow. 
The iterative construction is defined by setting 
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and, fork = 1,2,..., by solving 
z™ — (G/H) Sp—1(w, 2) = 2" Se(w, z) + Re(w, 2), (5.15) 


where we require S, to be holomorphic and nonvanishing in a neighborhood of 
0 € Ct! and we require R¢(w, z) to be a polynomial in z of degree less than m 
with coefficients that are holomorphic functions of w. 

Now we need to show that the sequence of functions S,(w,z),k =0,1,..., 
is uniformly convergent in some neighborhood of 0 € C"*!, To establish this 
fact, we will need to apply Lemma 5.1.4. We choose the positive real numbers 


Ply P2s- +++ Pn+1 SO that 


(Pn+1)” 
G/H| < ——_— 
ae / ls 30n +1 


holds, where 


A = {(w,Z) : Jwy| < Pj» j=1,2,...,0, Iz] < Pn+i}- 


We can make such a choice because G(0, z) = 0. 
It follows from (5.15) that 


Zz (See i(w, z) — S(w, 2)) si (Resi, z) — Ryu, 2) 
= ~(G/H) (Sk(w,z) - Se-1(w,2)) (6.16) 
holds. We conclude from Lemma 5.1.4 that 


l 
sup |Sx41(w, z) — Se(w, z)| < 5 Sup |Sk(w, Zz) — Se-1(w, 2) - 
A 2 A 


Thus ss 
$n, = $ (s- 5.) 


is uniformly convergent and, by (5.15), so is 


R= lim Rx. 
k—0o 


Remark 5.1.5 


(1) An alternative, more elegant, but less elementary, proof of the Weierstrass 
preparation theorem can be found in D’ Angelo [DA 93]. Bers [Be 64] gives 
proofs for both formal power series and convergent power series. An alge- 
braic version of the Weierstrass preparation theorem proved by a similar 
iterative argument can be found in Gersten [Ge 83]. 
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(2) For a function F(w, z), (w,z) € C” x C, that is holomorphic in a neigh- 
borhood of the origin, is not identically zero, and satisfies F(0, 0) = 0, it is 
always possible to make a linear change of variables so as to insure that the 
hypotheses of the Weierstrass Preparation Theorem 5.1.3 hold (exercise, or 
see Bers [Be 64]). 


5.2 Implicit Function Theorems without 
Differentiability 


In Section 3.4, we have already seen how to prove an implicit function theorem 
using the contraction mapping fixed point theorem. Here we will apply the more 
abstract Schauder fixed point theorem to obtain an implicit function theorem. The 
important practical distinction in the present approach is that we do not need 
differentiability. 

We begin by stating the fixed point theorem of Schauder, which is a generaliza- 
tion of the Brouwer fixed point theorem in Euclidean space. 


Theorem 5.2.1 (Schauder) Let Y be a Banach space and let K C Y be bounded, 
closed, and convex. If F : K —> K is a compact operator; then there exists a 
point p € K with F(p) = p. 


A proof of Schauder’s theorem can be found in Zeidler [Ze 86]. The original 
reference is Schauder [Sc 30]. 

Note that there is no uniqueness of the fixed point guaranteed by Schauder’s 
theorem. Our application to proving an implicit function theorem will also not 
provide a mechanism for obtaining a unique implicit value. But, by definition, a 
function must be single-valued. The dilemma then is to extract an ordinary single- 
valued function from a function that is naturally set-valued. The Axiom of Choice 
always provides one way to make such a selection of values, but that approach 
is far too abstract (and rarely leads to continuous or even measurable functions). 
Instead we will appeal to a measurable selection theorem. The result we use is the 
following (see Wagner [Wa 77; Theorem 4.1)). 


Theorem 5.2.2 If X is a topological space, Y is a complete, separable metric 
space, Cy is the set of nonempty closed subsets of Y, and ©: X -> Cy is such 
that, for each open U CY, 


XN{x: OX)NY £ B} 


is a Borel set, then there exists a Borel measurable function f : X — Y with the 


property that 
f(x) € &(@) 


holds for each x € X. 


I That is, F is continuous and F maps bounded sets to relatively compact sets. 
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Remark 5.2.3 In fact, there is an entire literature of measurable selection theo- 
rems and, within that literature, a sub-topic of measurable implicit function the- 
orems. The interested reader should consult Wagner [Wa 77] and the extensive 
bibliography therein. 

We can now state and prove our theorem. 


Theorem 5.2.4 Let X bea topological space. Let Y be a separable Banach space 
and let K C Y be compact and convex. 

If F : X x K -— K is continuous and if, for each x € X, F(x, -) isa compact 
operator as a function of y € K, then 


(1) for each x € X, 
KN\{y: y = F(x, y)} 


is a nonempty closed set, 
(2) for each open U CY, 
xn{x:D#UNKN(y:y= Fey} 
is a Borel set, and 
(3) there is a Borel measurable function f : X — Y such that 
f(x) = FQ, f(x)) 
holds all for x € X. 


Proof. 
(1) This is immediate from the continuity of F and Schauder’s theorem. 
(2) Because Y is separable and K is compact, we can write 


UNKN{y: y= F(x, y)} 
as a countable union of compact sets, that is, 


ioe] 
UNKN{y: y= Fy) =UC; 


i=] 


with each C; compact. We have 
oo 
Xn fx: O#UNKN{y:y= Foy) =(Jnc), 
i=] 


where II : X x Y -» X is projection onto the first factor, and, for each i, I1(C;) 
is compact. Thus, 


Xn fx: B#UNKNIy:y= Fo, y)}} 


is a Borel set. 


(3) This conclusion follows from the measurable selection theorem (Theorem 
5.2.2) above. O 
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5.3. An Inverse Function Theorem for 
Continuous Mappings 


Examples show—see Section 3.6—that even a slight weakening of the hypoth- 
esis of C' smoothness in the classical implicit function theorem gives rise to a 
failure of the result. Mathematics is, nevertheless, robust. One can weaken the 
C! hypothesis and strengthen the properties of the mapping in other ways and 
still derive a suitable implicit function theorem. This is what we do in the present 
section. Our work here is inspired by that in Fe¢kan [Fe 94]. 

The device that we use to compensate for lack of smoothness is functional 
analysis, in particular, the mapping properties of nonlinear operators presented in 
Chapter 5 of Berger [Be 77]. Specifically, we will assume that our mapping is a 
compact” perturbation of the identity. 

For the first part of this section, we let X be a reflexive Banach space. For 
instance X could be a Hilbert space or an L? space with 1 < p < oo. Let B be 
the closed unit ball in X. Consider a mapping F : B —> X which is a continuous, 
compact perturbation of the identity: F = 7 + G, with G compact. 

We will now generalize the concept of the “set-valued directional derivative” 
used in Kummer [Ku 91] to the context of Banach spaces. Let gy : [0,00) > 
[0, oo) be a continuous mapping such that y(z) = 0 if and only if z = 0. Let S be 
the set of limit points of 8B. Here, and in all subsequent discussions, the limit in 
X is taken in the sense of the weak topology on X. 


Definition 5.3.1 We define a set-valued directional derivative of the mapping F 
relative to g at the point 0 in the direction u € S to be 
Ay F(0)u 
_ F(xe + Agu) — FOr) 
= tv:iv=lin ———— 
k—00 px) 


where xz — 0, Ax | 0, up > u, Hugi] = y : 


Remark 5.3.2 A similar definition was introduced by B. Kummer in [Ku 91] 
where he defined 


F(xx + Agu) — F(xx) 


AF (Osu) = {vv = lim he 


Xk > OL Ag SY of (5.17) 


In case i(z) = z, X is finite dimensional, and F is locally Lipschitz, then we have 
Fxg + Agu) — F(xe + Age) <b Lgl lux — ul] > 0. 


Therefore, in this situation, Aj F(O)u = AF (0; u) holds, where A F(0; x) is as in 
(5.17). O 


2A compact mapping sends bounded sets to relatively compact sets. 
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Our theorem is as follows: 
Theorem 5.3.3 /f 
0 ¢ AyF(O)u (5.18) 
for each u €§, then F is locally invertible at 0. 
Proof. We claim that (5.18) implies the existence of a number € > O such that 
WFO) — F(x2)il = ee (llr — xall) (5.19) 
for each x}, x2 € Be = {x : [|x|] < €}. In fact, if (5.19) is not true, then there is a 


sequence {(x},;, x2,7)} such that 


l 
WF (x15) — F(x2,) Il < 7 ellen i — x2 ill), x17 2,5 > 0, 1,5 F X24. (5.20) 


We set 
Ai = U2 — x1 ll 
ae = X2,i — X1,i 
j= oH 
Ilx2,5 — x1ill 
Xj => X1,i . 
Then x2; = xj + Ajuj. Since lu; {| = 1 and X is reflexive, we can assume that 


uj converges to some % € S weakly (by the Eberlein-Smuljan theorem—see 
Zeidler [Ze 86; page 777]). Then (5.20) implies that 0 € AyF(O)u, which is a 
contradiction to (5.18). 

Thus we know that (5.19) holds. So F is one-to-one near 0. Since F is a contin- 
uous, compact perturbation of the identity, we can apply the invariance of domain 
theorem (5.4.11) of Berger [Be 77] to conclude that F is a local homeomorphism 
at 0. This completes the proof. O 


Corollary 5.3.4 If is increasing, then the inverse mapping w = F' (given by 
the theorem) satisfies 


IW (x1) — v0) < oo! (=) 


for some constant c > O and for x, x2 near F (0). 
Proof. The result is immediate from (5.19). O 
Next we give some examples to illustrate the ideas we have presented so far. 


Example 5.3.5 Let X = R and F(x) = x!/9. Of course we know by inspection 
that this function is invertible (globally). But it certainly does not satisfy the hy- 
pothesis of the standard inverse function theorem at 0. Let us see instead how it 
fits into the rubric of our new theorem. 
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Let i(z) = z. Clearly 
A;F(O)u =@. 
Thus, by our theorem, F is invertible at 0. Observe that this F is not even Lips- 


chitz. gO 


In the next example, we use our theorem to show that an odd power of x defines 
an invertible function. Since even powers of x fail to be invertible near the origin, 
the argument we use must rely on some significant distinction between odd and 
even powers. We isolate the salient fact in the following lemma. 


Lemma 5.3.6 Let k be a positive integer. For x,h € R, it holds that 
I(x + h)* —x7*| = 0 if and only if hk =Oorx =—4h, (5.21) 
and 
I(x + hy PAtE — x 2k4Ny > Q-2K pp 2k d (5.22) 


Proof. The result in (5.21) is clear. Likewise, (5.22) is clear if h = 0. Thus, we 
may assume that h # 0. 
Using the binomial theorem, we compute 


2k-+1 
(5 ed ¥) _ k+l 


j=0 


I(x + hy2k+! = x2k+l = 


2k+1 

DRAIN. gpa ec 

(2k + 1)x%* + y a ° amet ad I 
j=2 


We see that the quantity in (5.23) clearly diverges to +-00 as |x| —> oo. Thus, the 
left-hand side of (5.22) must attain its minimum at some value of x. We find that 
value of x by differentiating (x + h)?*+! — x?4+! (with respect to x) and setting 
that derivative equal to 0. That is, we need to solve 


(x +h)? — x4 = 0 


for x. But, by (5.21) and the assumption that h 4 0, we know that the only solution 
is when x = —$h. The result now follows by observing that, when x = —5h, 
(5.22) is an equality. O 


Example 5.3.7 Let X = Rand F(x) = x" for n > 1 an odd integer. Again, this 
F does not satisfy the hypothesis of the standard inverse function theorem at 0 
(because the derivative at 0 vanishes). We set (z) = z”. Since n is odd, we can 
use (5.22) to see that v € AyF(0)u for |u| = 1 implies that |v] > 2'-"_ Asa 
result, our theorem gives that F is invertible near 0. 

Note by contrast that O € AF (0; u) (as in the remark following Definition 5.3.1). 
So more classical approaches to the inverse function theorem will not yield the lo- 
cal invertibility of F. O 
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For the next example we need to introduce some machinery. Namely, we con- 
sider the special case when F(x) = x — G(x), for G : X — Y, a locally Lipschitz 
mapping. We also assume that Y is a Banach space that is compactly imbedded 
into X. It turns out that this set of operators is well suited to the study of cer- 
tain boundary value problems (as the next example will illustrate). In the present 
discussion, we no longer need assume that X is reflexive. 


Theorem 5.3.8 Let X, Y be Banach spaces with Y compactly imbedded into X. 
Consider F(x) = x — G(x), where G : X — Y is locally Lipschitz. If O ¢ 
AF(O; «) for every u with |\u|| = 1, then F is locally invertible at 0. 


Remark 5.3.9 Note that it is immediate that 0 € AF(O; x) if and only if uw € 
AG(O; uz). 


Proof. Proceeding as in the proof of Theorem 5.3.3, we will show that 0 ¢ 
AF(O; u) implies the existence of a number € > O such that 


FQ) — F(xa)ily = €llx1 — xallx (5.24) 


for each x},x2 € Bye = XM{x : [xf] < €}. In fact, if (5.24) is not true, then 
there is a sequence {(x,;, x2,;)} such that 


l 
IF (x11) — FQ) ily < 2 


X1G — X2 i] > X1i> X25 > 0, x16 A x27. (5.25) 
x 
We set 
Ai = |lx2i -— x allx 
X2,5 — X13 
uj 2.i Li 


x27 — xy allx 
Xj = X1i- 


Then x2; = x; + Aju;. By (5.24) and the fact that Y is compactly imbedded into 
X, we see that 


a | _ NFG2i)- FOilly 1 
ri ': ri Fil 


Since G is locally Lipschitz, we have 


G(x; + Ajuz) — G(x;) E p LAiuallx e 


Since Y is compactly imbedded into X, we see that we can pass to a subsequence, 
without changing notation, so that 


G(x; + Ajuj) — G(x;) 


Ay 
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converges. Thus we also have that u; converges to u € S. 
Again using the fact that G is locally Lipschitz, we have 


WG(xq + Agu) — GQxe + Agu)ily < L-Agl fun —ullx > 0. 
As aresult, we have 


G(x; + Aju) — G(x;) 


O = u-— lim 
i—0oo ri 
_ F(x +Aju) — F(X) 
= lm ——. 
i—oo rj 


contradicting the assumption that O ¢ AF (0; «). 
Thus we see that (5.24) holds, and the result now follows from the invariance 
of domain theorem (5.4.11) of Berger [Be 77]. O 


Now we will introduce a new class of examples and ideas. For this we need a 
definition. 


Definition 5.3.10 Consider a mapping F : R’™” —> R" We say that F has the 
property J at 0 if there exist ro > 0, Ao > 0, a bounded set Mo of n x m 
matrices, and a constant Mo > 0 such that, for each x € R™ with |x| < ro, each 
= € R" |z| < 1, and each a € R with O < A < Ap, there is a matrix A € Mo with 


Az) —- 
B+ 4) FO) dl mga id. 
For u € R"”, we will write 

Om, f (0; u) = {Au: A € Mo} CR" (5.26) 
Remark 5.3.11 


(1) Observe that none of ro, Ao, Mo, and Mo in the definition is assumed to be 
unique, so of course, neither is U,y, f (0; “). 


(2) Note that O,y, f(0; 2) is compact for any choice of u. 
(3) If f is C?, then f has the property J at 0 and one may set Mo = {Df (0)}. 


Example 5.3.12 Suppose a < b are real numbers andr : R > R is C? with 
r(0) = r’(0) =O. Set 


r(t)+at for t>0 
ro=| r(t)+bt for «<0. 


Then the function f(t) has the property 7 at 0 with 
M={A: At =ct,c € [a, b)}. oO 
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Example 5.3.13 Consider the boundary value problem (for an R”-valued func- 
tion u(t), t € R) 


u” f+ h(t) 


(0) = u(1) =0 (6.27) 


where f : R” -—>» R” is locally lipschitzian, has the property 7 at 0, and satisfies 
f (0) = 0. Also we assume that / is continuous. 
Set 


X = C%0,1), 
Y C70, 1), 
F(x) = x—G(x), 
G(x) = K(fox), 


where KC: X -> Y is the inverse of the mapping from C 2A{u : u(0) = u(1) = 0} 
to C° given by u +> u”. That is, if x € X = C0, 1), then K(x) is the unique C? 
function u (on (0, 1]) with u”(t) = x(t) and satisfying u(0) = u(1) = 0. Note 
that XC is linear. 

Preparatory to applying Theorem 5.3.8, we will investigate AG (0; u) foru € X 
with lu] = sup{la(t)| : ¢ € [0, 1]} = 1. Fix such aw and consider sequences 
xe € X anddy E Rk = 1,2,..., with fxg] < ro, x, — 0,0 < Ax < Ao, and 
Ax — O, where ro and Ag are as in Definition 5.3.10. For each k, we set 

ee G(xe + Ante) — Gre) _ ef + Agu) — f (Xr) . 
Ak Ak 


and suppose wy —> w € Y. For each k, we have 
we = Ku, 


where 
_ SOR + AKU) — fOr) 
uU = ————__—_————__.. 
k 
and because f has the property 7, we also have 


dist ve(0), Om, Ff (0; u(e))| < Mo(ag + [xx (t)|) forO <¢ < 1. (5.28) 
Setting 
T(u) = {s EX: s(t) On fs u(t) ford<1< 1}, 


It follows from (5.28) that 
weéeKT(u). 
We conclude that 
AG(0; u) C KT(u). 


As a consequence of Theorem 5.3.8, we have the following theorem: 
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If, for eachO # u € X = C%O, 1], we have that u ¢ KT(u), then 
the system (5.27) has precisely one small (in the Banach space norm) 
solution for any continuous h with ||h|| small. 


5.4 Some Singular Cases of the Implicit Function 
Theorem 


The standard implicit/inverse function theorem requires that the function in ques- 
tion be C! and that its Jacobian matrix be nondegenerate in a suitable sense, but 
simple examples show that something is still true even when the Jacobian matrix 
degenerates. For instance, the function f(x) = x? on the real line fails the Jaco- 
bian test at the origin. But f”(0) = Oand f’’(0) 4 0, and then elementary Taylor 
series considerations show that the function must be locally invertible at the ori- 
gin. (Of course the function is obviously invertible just by inspection. Also the 
methods of Section 5.3, Example 5.3.7 could be applied, but we are now adopting 
the point of view of the inverse function theorem.) 

The purpose of the present section is to explore the types of implicit and inverse 
function theorems that might be true in the case of a degenerate Jacobian matrix. 
Although there is an alternative treatment in Lefschetz [Le 57; pages 163-169], 
there does not seem to be any single all-encompassing theorem about this situ- 
ation. We content ourselves here with the treatment of some illustrative special 
situations and some accompanying examples. We follow closely the treatment in 
Loud [Lo 61]. 


Preliminary Remarks 
Let F(x, y,z) and G(x, y, z) be C! ina neighborhood of the origin and satisfy 
F(0, 0, 0) = G(0, 0, 0) = O. 


Since the two equations 


F(x, y, Z) 0 


Gli yz) = 0 (5.29) 


involve three variables, the usual implicit function theorem paradigm is that two of 
the variables, say x and y, should be expressible as a function of the third variable, 
z. Of course, the hypothesis required to apply the implicit function theorem is the 
nonvanishing at the origin of the Jacobian 


J = det ( - z ) (5.30) 


of the pair of functions F and G considered as a mapping 


(x, y) +> (F(x, y, 2), Gx, y, 2)) 
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(so essentially z serves as a parameter). 

In (5.30), and throughout this section, subscripts will denote partial derivatives. 
Additionally, we will use a superscript “0” to indicate that a partial derivative is to 
be evaluated at the origin, so for example 


aG 
0 
= —(0,0,0 
Gy = 5, (0.0.0) 
and later, when J is a function of two variables, 
oJ 
0 
= —(0, 0). 
Jy = 5,00) 


In this section we consider solving for x and y in terms of z in a neighborhood 
of (0, 0, 0), even without the hypothesis that 7 # 0. Of course, just as in the usual 
application of the implicit function theorem, if such functions x(z) and y(z) exist 
and are differentiable, then (by the chain rule) their derivatives dx/dz and dy/dz 
at z = O must satisfy 


~~ Fy,— F- = 0, 
feat an : 

dx dy 

—+G6,—+G, = 0. 
Gra t vag” : 


Thus it is clearly necessary (for the existence and finiteness of these derivatives) 


that the matrix 
Fo F° F® 
x y z 
have the same rank as the Jacobian matrix 
F. FP ) 
; : (5.32) 
(a a 


If in fact the matrix (5.31) has rank greater than the rank of the matrix (5.32), then 
the differentiable solution we seek does not exist. 


The Case of Jacobian Matrix of Rank 1 


We assume now that the rank of both the matrices (5.31) and (5.32) is 1. At least 
one of the entries in (5.32) is nonzero. For specificity, let us suppose that 


OF 
—(0,0,0) = F° £0. 
5, (00,0) = Fe #0 
Because (5.31) has rank 1 and because we have assumed Fo % O, it holds that 


Gy=— F) and G2=—* Fo (5.33) 
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We will replace the system F(x, y, z) = 0, G(x, y, z) = 0 by an equivalent but 
simpler system. Set 


Gy 
A(x, y,2) = G(X, yz) — Fy FO y,2)- 
x 


It is obvious that H° = 0 and we can also apply (5.33), so we see that 


He = HY = H2 =0 (5.34) 
holds. Clearly, the system 
F(x,y,z) = O 
HG yay 20 (5.35) 


is equivalent to the original system (5.29). 

Because we have made the normalizing assumption that Fo # 0, we can apply 
the implicit function theorem to solve the first equation F(x, y, z) = O in (5.35) 
for x as a function of y and z near y = z = 0. If the result is written x = f(y, Z), 
then f(0, 0) = O and the partial derivatives of f(y, z) can be computed from the 
partial derivatives of F(x, y, z). In a neighborhood of (0, 0), we have 


F.Lf(y, 2)» 2] FyLf(y. 2), ¥ 2) 
Substituting x = f(y, Z) into the second equation for H(x, y, z) in (5.35), we 


eliminate the unknown x from the system. 
We now consider the resulting equation 


J(y,z) = A[f(y.z), yz] =0. (5.37) 


(5.36) 


fy = and f= 


If (5.37) is solved for y as a function of z, y = y(z), with dy/dz finite at z = 0, 
then this y(z), together with x(z) = f(y(z), z), furnishes the desired solution of 
the system F(x, y,z) = 0, H(x, y,z) = 0. Furthermore, the derivative dx /dz 
would then be given by 


dx o Fy - (dy/dz) + F, 
dz 7 Fy 
It remains to see how to solve for y as a function of z. If we assume that all 
the needed derivatives of F and H exist and are continuous, then the derivatives 


of J(y, z) can be computed in terms of those of H and F by using (5.36). In 
particular, we see that 


(5.38) 


Jy = (—HyFy + HyFx)/Fe, 
J = (—Hy Fz 7 H; Fx) / Fx ’ 


~~ 


so, by (5.34), we have JO = J? =0. 
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The second and third derivatives of J are necessarily given by more compli- 
cated expressions. For instance, we have 


Jyy = (Hxx Fy re 2 xy Fy Fx - Hyy F2)/F?2 


—H,(F? Fex — 2Fx Fy Fry + F2 Fyy)/ FR - 


While the second line in the preceding expression for Jyy is zero at (0, 0) because 
H® = 0, there is no reason for the expression on the right in the first line to equal 
zero. More generally, the vanishing of Jyy or any other second or higher derivative 
of J at (0, 0) would be a special occurrence indicating a relationship among the 
second or higher derivatives of F and G at the origin. 
Since the first partial derivatives of J(y, z) vanish at (0, 0), we may write 
1 1 l l 
= shy + dy t sJn2 + ody +5 
ee ee 
+5 yee yoo + 6 sz? + higher-orderterms. (5.39) 
We want to find a solution of the equation J(y, z) = 0 which has y = Oat z =0 
and for which z takes nonzero values. We write y = nz in (5.39), where n will be 
a real-valued function of z. Then we find that 


2(! 1 
Iy2 = 2 (50? + Jon + 52) 


J° yz 


J(y,Z) ay 


6 > 2 YF 2 ve 6 
+ higher-order terms , (5.40) 


1 1 2,1 1 
+23 (208 m+ =Jon? + =—J° n+ g/2] 


so that for z # 0, J(y, z) = 0 is equivalent to 
1 


J(n, Zz) = (5 


1 
0.2 0 
Jyynt + Jyon + 5/3) 
Lo 3,12, 1 I 

< Ga + 5 vys" as 5 J ye0l + =I.) 

+ higher-order terms = 0. (5.41) 
We shall solve (5.41) for n as a function of z. The solution of J(y, z) = 0 for the 
variable y will then by given by y(z) = zn(z). 

Consider the quadratic equation 

1 
2 
which is just the limit of the equation (5.41) as z > 0. If 7 = n(z) is a solution of 
(5.41), then n(0) must satisfy (5.42). Now there are four cases depending on the 
sign of the discriminant (Jp. - Ae Je and the signs of the various second par- 
tial derivatives J?., JP.. and J®. Typically, either Case I or Case II below would 
apply. 


1 9 2 
= Jyyn + Syn + 


5! yy Jf =0, (5.42) 
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Case I: 
(JP. — Jp J, <0 
or 
J}, = JP. = Oand JP #0 


z 


In this case, (5.42) has no real roots. Thus no real value for n(0) can exist, so a 
solution of the type we seek does not exist. 


Case IT: 
(Jy)? — ID JR > 0 


In this case, (5.42) has one or two simple real roots, according as ie = Oor 
ihe % 0. In either case, the ordinary implicit function theorem then guarantees a 
solution n = n(z) of the equation (5.41) for each such simple real root of (5.42). 
If no is such a real root, then we have 


n(z) = noto(l), 

y(z) = zn(z) = no(z) + ofz), 
a atz=0 
(i aaa 


From (5.38) we find that at z = 0 we have 
dx a Fyno + Fz 


4 
dz Fy 2 >) 


Case III: 

(J9.)? — JP, JQ = Oand Jy, #0 
In this case, we can still solve for n(z) either for positive < or (in certain cases) 
for negative z. However, we will have to utilize fractional powers of z (very much 
in the spirit of Puiseux series—see Krantz and Parks [KP 92]). Let no be a double 
root of (5.42). The equation (5.41) then becomes 


1 1 4 
5 yy —no)* + =( ZS yyyln —no) + Si(n— 0)” + J2(n — no) + J ) 


+ higher-order terms = 0. 


Here 
1. = 5/3,y70 = aioe 
= slot + Jyysto + =I : 
o= aI + 51208 + 5J9..00 + aJo, 
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Subcase ITI(a): 
J3 =0 


This subcase cannot be resolved without the use of even higher order derivatives, 
so we will say no more about it. 


Subcase ITI(b): 
J3 #0 


We replace z by u? if J3 and Jy have opposite signs; we replace z by —u? if J3 
and J, have the same sign. Then, taking a square root, we find that either 


n— no = +£,/(—2J3/J},)u + higher-order terms 
n—no = +,/(2J3/ JO,)u + higher-order terms. 


As aresult, we have these solutions: 


or 


If J3 and I have opposite signs, then there are two solutions n(z) 
for positive z and none for negative z. 


If case J3 and Jy have the same sign, then there are two solutions 
n(z) for negative z and none for positive z. 


In either case, we have y = noz + O(\z|°/) so that, at z = 0, dy/dz = no and 
dx /dz is given by (5.43). 


Case IV: 
j= J,=)-=0 


ye 
In this case, we can replace (5.40) by 


1 
=Jo + higher-order terms =0. (5.44) 


1 l 1 
a Fayy + SIyyett + 5 Spe + as 


6° yyy 2° ¥ 


Consider now the equation 


lio 3,1, 2,1, 1 
Gly + aIyyet” + 5 Ipc + 6 
which is the limit of (5.44) as z —> 0. If (5.45) has no real roots, then there are no 
solutions of the type that we seek. If no is a multiple real root of (5.45), then higher 
derivatives are needed to produce a solution and we will say nothing about it here. 
Finally, if yo is a simple real root of (5.45), then the ordinary implicit function 


J2. =0, (5.45) 
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theorem shows that (5.44) has a solution 7 = n(z) with n(z) = no + o(1). So, 
again, y(z) = noz + o(z); and, at z = 0, 
dy dx Fyno + F: 
ray 
In this final case, there may be as many as three different solutions x = x(z), 
y = y(2). 


The Case of Jacobian Matrix of Rank 0 


In fact this situation is even more complicated, and more technical, than the rank 
1 case. We shall not treat it in any detail, but instead content ourselves with pre- 
senting some examples. The interested reader is referred to Loud [Lo 61] for a 
more complete picture. 


Example 5.4.1 Consider the system 


xz + yz + xy — 2 + zx 0, 


xXx + 2yz —- xy - 323 <3 zy = 0. (5.46) 


Since there are no linear terms in the system (5.46), it is clear that the rank of 
the Jacobian matrix is zero at the origin. Nonetheless, we wish to solve (5.46) for 
x and y as functions of z. This can be done by eliminating a variable algebraically, 
but we will illustrate an approach that uses the implicit function theorem and thus 
can be applied more generally. Our method exploits the fact that the coefficient of 
z? is O in both equations. 

Set x = &z and y = nz. After eliminating common factors of z”, we obtain the 
system 


i ane | i OM A i, 
—E + 2n — 32 — xn — £&n 
The pair of functions on the left-hand side of (5.47), 


F(é,n,z) = €+n-z2+ 2, 
G(&,n,z) = §&+2n—3z—2n, 


F, -F, 1 1 
= : (5.48) 
Ge Gy 1 2 


and obviously (€, 7, z) = (0, 0, 0) is a point that satisfies (5.47). Since the matrix 
in (5.48) has rank 2, the implicit function theorem guarantees the existence of 
solutions €(z) and n(z) to the system (5.47) with §(0) = n(0) = 0. Also, the 
implicit function theorem allows us to compute the derivatives of §(z) and n(z) 
implicitly. We find that d&/dz(0) = —1 and dn/dz(0) = 2. Thus, we have 


(5.47) 


satisfies 


x(z) = —z7 + o(z”) ; y(z) = 2277 + o(z?) : 
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Finally, by inspection, one can see that all points in 
{(x,0,0) : x ER} U{(, y,0) : ye R} 


also solve (5.46), giving two additional solution curves passing through the origin. 
O 


Remark 5.4.2 Example 5.4.1 is representative of what often happens with a pair 
of equations 


l 
—) 


F(x, y, Z) 


G(x, y,z) = 0. (5.49) 


in which, for both, the nontrivial part of the Taylor polynomial begins with the 
quadratic term. Letting Hr and Hg be those nontrivial quadratic forms, we con- 
sider the system 


Hr(x,y,zZ) = O, 
Hg(x,y,z) = 0, (5.50) 
x+y?+27 = 1. 


If a unit vector, v, can be found that solves the system (5.50) and if the matrix 
of partial derivatives on the left-hand side of (5.50) is non-singular at v, then one 
can form a new orthonormal basis for R? which contains the vector v, and after 
changing basis, the problem of solving (5.49) will look like Example 5.2.1 in that 
the square of one of the new variables will have coefficient 0. O 


The next example illustrates how one may still be able to analyze the solutions 
of a pair of equations even when the procedure discussed in Remark 5.4.2 is not 
fruitful. 


Example 5.4.3 Consider the system 


xy + yy — 2 + 2x = OQ, 


x? 4+ 2y? — g@ + zy = 0. C2) 


Again there are no linear terms, and thus the rank of the Jacobian matrix is zero 
at the origin. 

Remark 5.4.2 suggests the change of variables x = (1/./2)u — (1 /V2)w, 
y=u,z=(l /¥2)u + (1 /V2)w, which results in equations with neither the ur 
term nor the w” term. Unfortunately, when we replace v by &u and w by nu (or 
alternatively, replace u by €w and v by nw) and eliminate the common factors 
from each equation, we do not obtain a system with full rank (that is, rank 2) 
Jacobian matrix with respect to € and n. 

Instead, to solve (5.51) for x and y as functions of z, we set x = &z and y = nz. 
The result is the system 


ge 4+ y* - 1 + ev = O, 
0. 


4+ ae - 1 + m = one 
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Considering the limit of the system (5.52) as z approaches 0, we are led to the 
(real) common points of the loci of the equations 


ge 4+n?-1=0 and €242n?—-1=0 


which are (1, 0) and (—1, 0). At (1,0), we set u = €—1, v = n. The system (5.52) 
becomes 


2u + uw + vw + zut)) 0, 


+ 2v> 4 zu = Q. (6.53) 


If we define the two functions 


F(u,v,z) = Qu +u? +2 +2z(u+1) 
G(u,v,z) = 2u+u2+2v*+2z0, 


FO Fo 2 0 
G® gi} \2 0 oon) 


is only of rank 1, so the implicit function theorem cannot be invoked to give us 
functions u(z) and u(z). On the other hand, it is the case that the matrix 


Fo FO Fo 201 
— 4 = (5.55) 
Gr Gy G; 2 0 0 


has rank 2, so we will still be able to solve for u and v by elimination. The first 
equation of (5.53), when solved for u, gives 
eee Ley. : 2+. higher-order terms 
c= n° 75 rhe & : 
When we substitute this last equation into the second equation of (5.53) we 
obtain 


then we see that 


—z+u*+uz+ 52 + higher-order terms = 0. 
From this equation we see that, for small positive z, we can write 
v= +/z+0(/z). 
In conclusion, we find that 
u=oJfz), E=l=o(Vz2), n=tJSz+0(/2), 


x=zto(z*), y= +23/2 4. o(z*/*), 


all for small, positive z. The analysis at the pointé = —1,yn =Oissimilar. O 


6 


Advanced Implicit Function Theorems 


6.1 Analytic Implicit Function Theorems 


We will now consider implicit function theorems in both the real analytic and the 
complex analytic (holomorphic) categories. These are obviously closely related, 
as the problem in the real analytic category can be complexified (by replacing 
every x/ with a z/) and thereby turned into a holomorphic problem. Conversely, 
any complex analytic implicit function theorem situation is a fortiori real analytic 
and can therefore be treated with real analytic techniques. And both categories are 
subcategories of the C™ category. 

To make things completely clear we repeat: An implicit function problem with 
real analytic data automatically has a C™ solution by the classical C® implicit 
function theorem. The point is to see that it has a real analytic solution. Likewise, 
an implicit function problem with complex analytic (holomorphic) data automat- 
ically has a C® solution by the classical C® implicit function theorem; it also 
automatically has a real analytic solution by the real analytic implicit function 
theorem. The point is to see that it has a complex analytic (holomorphic) solution. 

The astute reader will notice that, in the theorem below, we use Cauchy’s 
method of majorization. This is also the principal tool in the celebrated Cauchy— 
Kowalewsky theorem for existence and uniqueness of solutions to partial differ- 
ential equations with real analytic data (see Section 4.1). In fact the implicit func- 
tion theorem can be proved using a suitable existence and uniqueness theorem for 
partial differential equations, as was shown in Section 4.1. 

In the following theorem and its proof we shall use muddtiindex notation. On 
RY‘ a multiindex is an N-tuple w = (a@,...,@y), where each a; is a nonneg- 
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ative integer. Then, for x € RY, 
Qe ,@1 a2 an 
x = x) % X4 cae. XN ° 


Also Ja| = ay +---++ay and a! = a)!-a2!---ay!. Finally, if = (a,..., ay) 
and B = (f},.-.-, Bn) are both multtindices, then aw + 6B = (a; + B1,...,ay + 
By). We will use the notation e; to denote the multiindex (0, 0,...,1,0,...,0), 
with a 1 in the j" position and all other entries 0. In calculations involving mul- 
tiple variables, multiindex notation is essential for clarity. 

We will consider power series expansions for a function g(x, y), where x € RY 
and y € R. So we will have powers x”, for a a multiindex, and y*, for k a 
nonnegative integer. We shall write such a power series expansion as 


g(x,y) =) aaex?y*. 
a,k 


It will be understood in this context that @ ranges over all multiindices and k 
ranges from 0 to oo. If we write ago, it will therefore be understood that the first 
Ois an N-tuple (0, ..., 0) and the second is the single digit 0. 

Finally, we need Hadamard’s estimates for the size of the coefficients of a con- 
vergent power series that are given in the next lemma. 


Lemma 6.1.1 Let 
F(x) = >> aax” y* 
a,k 


be a function defined by a power series in x = (x,,...,xy) and y that is conver- 
gent for |x| < Ry, |y| < R2. Assume that F is bounded by K. Then 


laokl < KR, RZ*. 


Proof. Exercise for the reader. Note that the constant K comes from an applica- 
tion of the root test. O 


Theorem 6.1.2 Suppose that the power series 


F(x, y) =) danx"y* (6.1) 
a,k 


is absolutely convergent for |x| < Ri, |y| < Ro. if 


ao,0 = Oand ag, # 0, (6.2) 
then there exist rg > O and a power series 
f(x) = y, Cox™ (6.3) 
Ja]>0 


such that (6.3) is absolutely convergent for |x| < ro and 


F(x, f(x)) =0. (6.4) 
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Proof. It will be no loss of generality to assume ao,1 = 1, so that (6.1) takes the 
form 


foe) 
F(x,y)=y+ Do aot+aaiy)x* + YY aaaxty’ (6.5) 


la|>0 lal|>0 k=2 
Introducing the notation by,, = —dg,z, we can rewrite the equation F(x, y) = 0 
as 
[o.¢] 
y= Do (be.0 + bar yx? + D> D baexty*, (6.6) 
ja|>0 la|>0 k=2 
or y = B(x, y), where 
[oe] 
B(x, y) = D> (ba.0 + bar yx® + > D> bagx%y*. (6.7) 
|a|>0 la|>0 k=2 


Substituting y = f(x) into (6.6) with f(x) given by (6.3), we obtain 


> Cox® = ys be,ox" + be > by, vepx**F 


|a|>0 la|>0 Jja|>0 |B|>0 
© k 
+ > Yo ba nx® (> co (6.8) 
Ja|>0 k=2 |B|>0 


If all the series in (6.8) are ultimately shown to be absolutely convergent, then 
the order of summation can be freely rearranged. Assuming absolute convergence, 
we can equate like powers of x on the left-hand and right-hand sides of (6.8) and 
obtain the following recurrence re’ations: First, we have 


Ce; = be;,0 (6.9) 


Equation (6.9) allows us to solve for each Ce;- Next we indicate how each coef- 
ficient cq of higher index may be expressed in terms of the b,,; and indices cg 
with index of lower order. In point of fact, let us assume inductively that we have 
so solved for cg for all multiindices a with |v] < p. Then, fixing a multiindex a 
with |a] = p and identifying like powers of x, we find that 


Cate; = bo-+e,.0 + Y by.1¢B 
piymaves 
J 
+ »> M(B",..., BX) + by,k cgi -cg2-+-Cgk (6.10) 


lyl20.A22. 
181>0.p1+--pk+y=ate; 


Here M(B!,..., B*) is asuitable multinomial coefficient (and superscripts are la- 
bels, not powers). Observe that each such M is positive. We can see by inspection 
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that all of multiindices on coefficients c that occur on the right-hand side have 
size less than or equal to p. This is the desired recursion. 

While the recurrence relations (6.9) and (6.10) uniquely determine the coeffi- 
cients cg in the power series for the implicit function, it is also necessary to show 
that (6.3) is convergent. The easiest way to obtain the needed estimates is by using 
the method of majorants, which we describe next. 

Consider two power series in the same number of variables: 


00 
(x1,%2,---5Xp) = D Phefaug Xe aes (E11) 
Ji J2s--0Jp=0 
V(x1,%2,.--,X%p) = >. Wir davon px ssaxee (6.12) 
Jie J20---.jp=0 


We say that V(x), x2,..., Xp) iS a majorant of P(x;, x2,..., Xp) if 
IDA inmsiol S Wir ineendp (6.13) 


holds for all ji, j2,---5 jp- 
Because all the coefficients M(f', ..., B*) in (6.10) are positive, it follows that 
if 
Gay= D> sauxty*, 
|a|>0,k>0 


(with g0,0 = 0,1 = 0) is a majorant of 


Bax, y) = Do (bao + baryx® + D> bagxt yk 


ja|>0 la|>0,k>2 
and if 
h(x) = >> hax® (6.14) 
la |>1 
solves 
h(x) = G[x, h(x)], (6.15) 


then h(x) will be a majorant of f(x). Consequently, if the series (6.14) for h(x) 
is convergent, then the series (6.3) is convergent and its radius of convergence is 
at least as large as the radius of convergence for (6.14). 

We take 


Gix,y) = Kr 
r—( +--+ +xy)—y! 


where 
K =sup{|B(x, y)|: x € DO, Ri), y € D(O, R2)} 
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and r is sufficiently small (depending on R; and R2). We see that G is a majorant 
of B by Lemma 6.1.1. For this choice of majorant, the equation (6.15) is quadratic 
and can be solved explicitly. The solution is clearly holomorphic at x = 0. In fact, 
y = h(x) is easily seen to be the solution of the quadratic equation 


y+ (x1 +-+-xy —r)y+ Kr =0. (6.16) 


O 


Theorem 6.1.2, the main result of this section, is somewhat special. For it only 
considers the case of one dependent variable and arbitrarily many independent 
variables. We leave it to the reader to apply the inductive method of Dini (Section 
3.2) to derive a general real analytic implicit function theorem from our Theorem 
6.1.2. 

Recall that we proved a version of Theorem 6.1.2 in Section 2.4 during our dis- 
cussion of Cauchy’s contributions to this subject. We refer the reader to [Kr 92] 
and [KP 92] for a detailed consideration of various kinds of analytic implicit func- 
tion theorems. 

We close this section by noting that the complex analytic version of the implicit 
function theorem follows almost immediately from the C! version. That is, once 
you know that the solution y = W(x) to a holomorphic system 


F(x, y) =0 


is continuously differentiable, then you can apply 0/0Z; to both sides and apply 
the chain rule to determine that yw is holomorphic. Once the complex analytic 
case is proved, then the real analytic case follows easily (by the method of com- 
plexification) as already indicated. Further details of these ideas may be found in 
(Kr 92). 


6.2 Hadamard’s Global Inverse Function Theorem 


We have already established that the implicit function theorem and the inverse 
function theorem are equivalent, so it is sometimes useful to refer to the two theo- 
rems interchangeably. In this section, we will formulate the relevant ideas in terms 
of the inverse function theorem. 

Imagine a continuously differentiable mapping F : U — V, where U, V are 
open subsets of R. Assume that the Jacobian determinant of F is nonzero at 
every point. Then we may be sure that F is locally one-to-one. We now wish to 
consider under what circumstances F is globally one-to-one. This is not expected 
to hold in every case. For example, the mapping (x, y) > (e* cos y, e* sin y) is 
smooth, locally one-to-one, and its Jacobian determinant is everywhere nonzero, 
but nevertheless the mapping is not globally one-to-one. 
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Let F : RY -> R" be a C* mapping. For convenience of notation, we will 
assume F(0) = 0. We define H : Rx RY - RYN by setting 


, | Ftx)/t if O<t 
Hx) = | (DF(0),x) if <0. 


(Recall from Section 3.3 that we use the notation { , ) to denote the application 
of the Jacobian matrix to a vector.) Note that H(0, -) is the Jacobian matrix of F, 
while H(1, -) is F, so H restricted to (0, 1] x R* is a homotopy between F and 
its Jacobian. It is not difficult to see that H is C'! when F is C2. In fact, we have 


La if O<t 
oH OX; 
=) an (6.17) 
ne 2" (0) if +<0. 
Ox; 


ML _ f (DF(tx), x) — F(tx)/t? if O<t 
ape 0 if +<0. 


Now we need a crucial geometric property of this homotopy H. 


Lemma 6.2.1 Let F : RN — RY be a C* mapping. Suppose that F(0) = 0 
and that the Jacobian determinant of F is nonzero at each point. Then, for each 
y € RY, H—'(y) consists of a nonempty union of closed arcs. Moreover: If A 
H—'(y) is an arc, and if the hyperplane t = c cuts A, then it does so transversely 
in exactly one point. 


Proof. We write H(t, x) = (A, (t,x), H2(t,x),..., Hn(t, x)). Then DA is the 
N x (N + 1) matrix 


OH, Of, 0H, 

ar Ox} = Oxn 
(6.18) 

OHyn OHnN dHn 

or Ox) _ OxN 


By (6.17), for 0 < t, the N x N matrix obtained by omitting the first column of 
DH is DF(tx). Since DF is nonsingular everywhere, it follows that DH has full 
rank at every point. 

Since DH has full rank at each point, it follows from the rank theorem that, for 
each y € RN, the set H—!(y) consists of a union of closed arcs. These may be 
compact or not and some may be topological circles. By inspection, the endpoints 
of these arcs must lie in ({0} x RY) U ({1} x RY)—the boundary of J x RY. In 
fact, one can see that 


H-"¢y) A ({0} x RY) = (0} x ((DFOI"". y} (6.19) 
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and 
Hy) N((} x RY) = (1) x FQ). 


In (6.19), [DF(0)}~' is the inverse of the linear map DF (0). 
Now let A C H~!(y) be any arc. To each P € A we assign a continuously 
differentiable unit tangent vector 


A(P)e; +v, (6.20) 


where e; is a unit vector pointing along the f-axis and v = (0, a),...,a n) lies in 
the space orthogonal to e;. Since H is constant along A, 


(DH(P), [A(P)e, + v}) =0. 


Therefore 
A(P)(DH(P), e1) _ —(DH(P), v). 


But (6.18) tells us that 
(DH(P), v) = (DF(tx), w), 
where w = (a),...,@n),¢ =¢(P), and x = x(P). As aresult 
\(P)(DH(P), e1) = —(DF(tx), w). (6.21) 


It follows from (6.21) that, if the hyperplane {t = c} cuts A at P, it does so 
transversely; for, if this were not the case, then it would follow that A(P) = 0 
and therefore that |w| = 1; as a consequence, DF (cx) would be singular. That 
would be a contradiction. Furthermore, if the hyperplane {tf = c} were to cut A at 
a second point Q # P, then P and Q would divide A into three subarcs, at least 
one of which, call it A’, would have both endpoints in the hyperplane {tf = c}. 
Since A’ is bounded (because both endpoints are in the hyperplane), there must 
be a maximum or minimum value of t along A’ that is not equal to c. At any point 
on A” where that extreme value is attained, A would equal O, and that would be 
a contradiction. By the same reasoning, no A can be a topological circle. This 
completes the proof of the lemma. O 


We next establish some further properties of the arcs which compose the pre- 
images H~'(y). 


Property 1: Let A be an arc contained in H —!(y). If an endpoint of A lies in 
{1} x R% and if a hyperplane {t = c} cuts A, then so does the hyperplane {t = d} 
for any c < d < 1. (This follows from the connectedness of A.) Similarly, if an 
endpoint of A lies in {0} x R* and if a hyperplane {t = c} cuts A, then so does 
the hyperplane {¢ = d) for anyO <d<c. 
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From Property 1, we immediately obtain the following property: 


Property 2: If an endpoint of A lies in {1} x R and if a hyperplane {t = c} does 
not cut A, then neither does {¢ = d} cut A for any O < d < c. Similarly, if an 
endpoint of A lies in {0} x R” and if a hyperplane {tf = c} does not cut A, then 
neither does {¢ = d} cut A force <d < 1. 


Let 
S = {c : the hyperplane t = c cuts A}. 


Then S is a bounded subset of the real numbers. Either 0 or 1 is in S. In case 1 
is in S, let co = inf S. If there is a real number a > O such that the part of A 
which lies in the closed region of J x R™ between the hyperplanes {t = co} and 
{t = co+a} is bounded then, since A is obviously closed, the hyperplane {t = co} 
cuts A. We can reason similarly in case 0 is in S. We conclude that: 


Property 3: Either the curve A is compact or else it is asymptotic to the hyper- 
plane {t = co}. 


We parameterize the arc A in the following way: If the hyperplane {tf = c} 


cuts A at P, then P is given coordinates (c, x}(c),...,xy(c)). Here x(P) = 
(x1 (c), ...,xy(c)). Thus we may represent A, defined in (6.20), as a function of 
fe 

A(t) =H +47 +++ +55)"; (6.22) 


here the sign is chosen depending on the orientation of the arc A and the accent - 
is used to denote differentiation in ¢. 

In order to prove our first version of Hadamard’s global inverse function theo- 
rem, we first must introduce some ancillary terminology from topology. 


Definition 6.2.2 Let X, Y be topological spaces and g : X —> Y a mapping. We 
say that g is proper if whenever K C Y is compact then g—!(K) € X is compact. 


The notion of properness bears some discussion. It is distinct from continuity. 
A proper mapping need not be continuous. For example, the mapping 


f : (0,1) > (0, 1) 


given by 
x if O<x <1 
fy =| 1/2 if x=Oorl 


is proper but not continuous. Conversely, a continuous mapping need not be 
proper (exercise). 

When the topological spaces in question are open domains in R, then there 
is a useful alternative formulation of properness. Namely, let U, V be connected 
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open sets (domains) in RY. Let g : U — V bea mapping. The mapping g is 
proper (according to Definition 6.2.2) if and only if whenever {x;} C U satisfies 
xj — OU then f(xj) — OV. We leave the details of the equivalence of the two 
definitions as an exercise for the reader, or refer to Kelley [Ke 55). 


Theorem 6.2.3 (Hadamard) Let F : RN -> R% be a C? mapping. Suppose 
that F(0) = 0 and that the Jacobian determinant of F is nonzero at each point. 
Further suppose furthermore that F is proper. Then F is one-to-one and onto. 


Proof. Define H as above, and consider y € R. 

Let A be anarc in H~!(y). If A were not compact, then A would be asymptotic 
to a hyperplane {tf = co}, 0 < co < 1. So there would exist sequences t; — co 
and xj € R" with |xj| —> 00 such that F(t;x;)/t; = y for all j. We conclude 
that F—! [B(coy, 1)] is unbounded, contradicting the assumption that F is proper. 


Because DF (0) is nonsingular, we know that H~!(y) must contain an arc with 
endpoint (0, ({DF(0)}~', y)). Since that arc must be compact (as we just showed) 
and since by Lemma 6.2.1 each hyperplane {tf = c} must cut the arc in at most 
one point, the other endpoint of the arc must be in (1, F~'(y)). Thus F maps 
onto RN 

To see that F is one-to-one, suppose there is another distinct point in (1, Fo! (y)) 
That point must be the endpoint of another distinct arc in H~' (y). Reasoning as 
above, we see that the second endpoint of this new arc must be a second distinct 
point in (0, ([DF(0)}~', y)), contradicting the fact that DF (0) is nonsingular. 0 


Next we present a refined version of Hadamard’s theorem. 


Theorem 6.2.4 Let F : RN -» R" be a C? mapping. Suppose that F(0) = 0 
and that the Jacobian determinant of F is nonzero at each point. Finally, assume 
that there is a constant K such that 


([DF(x)]~', v)| < Klvl (6.23) 
holds for all x € RN, and vectors v € RY. Then F is a C? diffeomorphism. 


Proof. Proceeding as in the proof of Theorem 6.2.3, define H as before, consider 
y € R‘, and let A be anarc in H —!(y). We need to show that A is compact. 

If A were not compact, then A would be asymptotic to a hyperplane {t = co}, 
0 < co < 1. Let A be parametrized by (¢, x(t)) either forO < ¢ < co or for 


co < t < 1. We have 
F(tx(t)) = ty 


so, after differentiating with respect to ¢, we obtain 
(DF(tx(t)), tx(t)) + (DF (tx(t)), x(£)) = y- (6.24) 
Since DF is everywhere invertible, we can rewrite (6.24) as 


k+x = ((DF(tx)}"!, y). (6.25) 
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Since |x(t)] > 00 as f —> co and since, according to (6.23), 
MIDF(x)I"", y)I 


is uniformly bounded by K ly], there is an € > Oso that O < |t —co| < € implies 
([DF(tx)]~!. y)| < |x(t)|/2. Thus we have 


X-x+x-x= ([DF(tx)]—', y) «x < 4x x. (6.26) 
Set m(t) = |x(t)|. From (6.26) it follows that 
m(t) < —4m(t) (6.27) 


holds for 0 < |t — co] < €. Choosing a value fg # co at distance less than € from 
co, we can integrate (6.27) to conclude that 


1 
0 < m(t) < m(to)e7 2"! 


holds for ¢ between fo and co, contradicting the assumption that |x(t)| —> 00 as 
t—> CoQ. 
The remainder of the proof of the theorem follows that of Theorem 6.2.3. O 


As an application of Hadamard’s theorem, we will give a proof of the funda- 
mental theorem of algebra. 


Theorem 6.2.5 (Fundamental Theorem of Algebra) If p(z) is a nonconstant com- 
plex polynomial, then there is a zo € C for which p(zo) = 0. 


Proof. Assume that p(z) is a polynomial of degree n > 1. Let P be an anti- 
derivative for p (choose and fix one such anti-derivative). Of course P will be a 
polynomial of degree n + 1. Consider a large circle centered at the origin. The 
image under P of such a circle will be a curve that encircles the origin n + 1 > 2 
times. Such a curve must cross itself, for if it did not, then it would be a simple 
closed curve, so by the Jordan curve theorem it would enclose a cell. Thus it could 
not encircle the origin the two or more times required. In summary, the mapping 
zt» P(z) is not one-to-one. 

We conclude that there must be a zg € C such that P’(zo) = p(zo) = 0. This 
is so because, if P’(z) = p(z) never vanished, then Hadamard’s theorem (the 
properness follows because P(z) behaves like its leading term when z is large) 
would imply that the mapping P is a diffeomorphism; and we know itis not. O 


Remark 6.2.6 The reader who feels the preceding proof is too heuristic may ap- 
peal to the following two facts from elementary complex analysis: 


For a polynomial of degree d, the image of any sufficiently large 
circle centered at the origin has winding number +d about 0. 


The winding number of a Jordan curve is +1 or O about any point not 
on the curve. 
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Another form of Hadamard’s global inverse function theorem, which we dis- 
cuss next, allows more freedom in the choice of domain and range of the function, 
but involves more sophisticated topological considerations. Because more topo- 
logical background is involved, we will not prove the result here. A proof can be 
found in Gordon [Go 72]. 

We recall what it means for a topological space to be simply connected. 


Definition 6.2.7 Let X be a topological space. We say the X is simply connected 
if every closed curve in X can be contracted to a point. More precisely, the require- 
ment is that whenever @ : [0, 1] — X is a closed curve (that is, ¢@ is continuous 
and @(0) = ¢(1)) there exists a continuous function H : [0,1] x [0,1] ~ X 
such that 


(1) A(t, 0) = d(¢), for all t € (0, 1], 
(2) H(0,u) = H(1,u) = p, for all u € [0, 1], where p = $(0) = $(1), and 
(3) H(t, 1) = p, forallt € [0, 1]. 


Euclidean space R” is simply connected for every choice of N > 1, but the sphere 
S* is simply connected only when N is greater than or equal to 2. An annulus or 
torus is never simply connected. The fact that is relevant to our application is 
that for N > 3a point can be deleted from R™ and the space remains simply 
connected. In contrast, R* \ {p} is not simply connected. 


Theorem 6.2.8 (Hadamard) Let M, and M2 be smooth, connected N-dimen- 
sional manifolds and let f : M, — Mz be aC! function. If 


(1) f is proper, 
(2) the Jacobian of f vanishes nowhere, and 
(3) M2 is simply connected, 

then f is a homeomorphism. 


Based on ideas in Gordon [Go 77], we now give an application of this theorem 
of Hadamard to a fundamental question of differential topology. 

The real line, R!, is canonically identified with the field of real numbers, and 
R? is identified with the field of complex numbers via the Argand diagram (1806). 
Hamilton (1805—1865) showed that R‘ can be endowed with the algebraic struc- 
ture of the quaternions (1843), and Cayley (1821-1895) showed that R$ carries 
the algebraic structure known as the octonions or Cayley numbers (1845). It is 
natural to ask which other Euclidean spaces can be equipped with a field struc- 
ture, or a division ring structure, or something similar.! 


1Of course the algebraic operations must be required to be smooth or every R” can be made a field 
trivially by using the one-to-one, onto mapping from set theory that demonstrates that R! and R" have 
the same cardinality. 
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While these questions were investigated in the nineteenth century (see Kline 
[KI 72]), we can now say definitively—and this is one of the great triumphs of 
twentieth-century mathematics—that the examples above form the complete list: 
Only dimensions 1, 2, 4, 8 can have the sort of structure we seek. And the standard 
structures—reals, complex numbers, quaternions, and Cayley numbers—are the 
unique such structures. The precise result is that R” is a (possibly nonassociative) 
normed division algebra only for n = 1, 2, 4, 8. This result is the work of J. Frank 
Adams [Ad 60], and it relies on Adams’s analysis of the Steenrod algebra. While 
it would not be appropriate to discuss Steenrod algebras here, we can in fact use 
the ideas developed in the present section to show that there is no (commutative) 
division ring structure on R> We now turn to that task. 

In the following theorem, we use the arbitrarily chosen symbol © to denote a 
hypothetical operation on R% 


Theorem 6.2.9 For N > 3 there is no operation © of multiplication on R% 
which satisfies the following axioms (for x, y,z € RN and € R) 


(1) xO (Ay) = (x)oy=Alroy), 
(2) xO(yt+z) =xOyt+xor, 
(3) xoy=0 = x=O0ory=0, 
(4) xOy=yox. 


Proof. By Axioms (1), (2), and (4), we see that if x = (x1,x2,..., x) and 
y= (yi, y2.---, yn), then 


N 
xoy= a Xj (e; © e;) ; (6.28) 
ij=l 


where e;, e; are elements of the standard basis. 
LetG:R* — R" be given by G(x) = x ox. By (6.28), G is smooth and we 
can calculate that 


(DG(x), v) = 2x ov, (6.29) 


so by Axiom (3), DG is nonsingular on R¥ \ {0}. 

The restriction of G tog : R% \ {0} > R” \ {0} is continuous and is easily 
shown to be proper. By Axiom (3) and (6.29), we conclude that the Jacobian of 
g vanishes nowhere on R% \ {0}. Thus g is a homeomorphism by Theorem 6.2.8 
above. But Axiom (1) entails that (—x)o(—x) = xox, so this is obviously absurd. 
This contradiction establishes the result. O 
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6.3. The Implicit Function Theorem via the 
Newton—Raphson Method 


We have already presented the implicit function theorem from the classical point 
of view of estimates (Section 3.3) and from the point of view of fixed point the- 
ory (Section 3.4). One of the most useful approaches to the theorem is by way of 
the so-called Newton—Raphson (Joseph Raphson: 1648-1715) method. This ap- 
proach has the advantage of universality: it works in any norm. As a result, one 
sees immediately that the implicit function theorem is valid not only in C* but 
in other function spaces such as Sobolev spaces, Lipschitz spaces, Besov spaces, 
and so forth. We follow the approach of Cesari [Ce 66]. We begin with the setup 
for our theorem. 


Notation 6.3.1 
(1) Let Y be a Hausdorff, locally convex, topological vector space. 
(2) Suppose Y has the property that if a sequence {y}n=1,2,... is such that, 


for any neighborhood V of the origin, there is an M 


for which M <i and M < j imply yj — yj € V. (6.30) 


then the sequence converges to some y € Y 
(3) Let Yo CY. 


(4) Let F be another linear space. Suppose that we are given a functional f : 
Yo — F. We assume an “approximate” solution yo of the equation f(y) = 
0 is given. 


(5) Let V = {Va}aec.4 be a neighborhood basis of 0 € Y, that is, every neigh- 
borhood of 0 € Y contains an element of V. Suppose that if V € V and 
X> 0, thendaV ée V. 


(6) We further suppose that each element of V is balanced, absorbent, and con- 
vex." 


(7) Let S be a closed, convex subset of Yo such that So = S — yo is balanced. 


Definition 6.3.2 Condition (2) will hold whenever Y is a complete metric space,> 
so by analogy we will say that a sequence {yn}n=1,2,... that satisfies (6.30) is a 
generalized Cauchy sequence. 


2A set B is balanced if cB ¢ B holds for all scalars ¢ with |c| < 1. A set A is absorbent if 
Y = U;s01A. 

3There is a generalization of the notion of “completeness” that can be applied to non-metrizable 
spaces. Our condition (2) also will hold for such complete spaces. The interested reader should consult 
Arkhangel’skil and Fedorchuk [AF 90], Bourbaki [Bo 89), Page [Pa 78), or Zeidler [Ze 86). The latter 
reference is specifically intended for the context of functional analysis. 
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Theorem 6.3.3 We use the notation and machinery from above. Assume that there 
are numbers ko, k withO < ko < 1,0 < k < 1 — ko and linear operators 
B:F—->YandA:Y -— F with the following properties: We suppose that B 
has trivial null space and that, whenever y,, y2 € S and y, — y2 € V € Y, it 
holds that 


BU f(n) — fO2) —AOQi — y2)] €  ko(SoNV), (6.31) 
Bf(yo) € kSo, (6.32) 
BA = I. (6.33) 
IfT :S — Y is given by 
Ty = y— Bf(y) (6.34) 


for y € S, then there is one and only one element W € S with f(w) = 0. 


In the proof, we shall use a fixed point construction to establish the existence 
of y. After proving the theorem in detail, we shall put the result in context and 
illustrate it with some examples. 


Proof. First note that the identity y = Ty implies, by (6.34), that Bf(w) = 0. 
Since the null space of B is zero, we conclude that f(w) = 0. Conversely, if 
f(v) = O with y € S then wy = Ty. In conclusion, a fixed point of 7 is just the 
same as a root of f. 

Properties (6.34) and (6.32) tell us that 


Tyo — yo = —Bf (yo) € kSo. (6.35) 


If y1, yo € S and Vo € VY, then the absorbancy property tells us that there is 
aA > Osuch that y; — y2 € AVo. Furthermore, we have 4Vo € V. Now, taking 
V = AVo and noting that BA = TI, we see that (6.31), (6.33), and (6.34) tell us 
that 


Ty, — Ty2 = —Blf(n) — fy2) — AQ — y2)] € koSo- (6.36) 


Trivially, 


Ty — yo = [Ty — Tyo] + [Tyo — yo] (6.37) 


holds for each y € S. By (6.36) the first term on the right-hand side of (6.37) lies 
in kg So, and, by (6.35), the second term on the right-hand side of (6.37) lies in 
kSo. Thus we see that Ty — yo lies in (k + kg)So C So. As aresult, 7: S > S. 

Now we set up an iteration scheme in the spirit of the Newton—-Raphson method. 
Let yn+1 = Tyn for n = 0, 1,2, .... Of course yo is already given. Then y, € S 
for all n. Let V € V. Again, by absorbancy, there is a A > O such that y; — yo € 
XV. And of course AV € V. Our key “estimate” is to show that 


Yn+l — Ya E koaVv , i = 0, l, 2 iii (6.38) 
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This we now do. 


The relationship (6.38) is plainly true when n = 0. Suppose inductively that 
(6.38) has been established for 0, 1, ..., 7 — 1. We next prove the assertion for n. 
In point of fact, we have from (6.28) that 


Yntt — Yn = —BUF (Yn) — fn—1) — AQn — Yn=1)] € Ko(kG71AV) = KGAV 
Thus (6.38) is now proved for every 7. Iterating, we have that 
Ya+p — Yn = (Yn+1 — Ya) + (Yn+2 — Yn) + +++ + ntp — Yn+p—1)~ (6.39) 
We see that yn4» — yn lies in 
KEAV + RMT ee Ee RRTPOI © (1 — ko) ABA, (6.40) 


where, of course, kg < 1. Now, given any V € JY, there is some 77 such that if 
n > nand p > 0, then (1 — ko) !ktA < 1/2 and 


Yntp — Yn € (1/2)V & (1/2)V & V. 


We conclude that {y,,} is a generalized Cauchy sequence in Y. So by hypothesis, 
W = limn— oo Yn exists. 

Note that y € Y and w é S. Finally, (6.40) implies that yy — y, belongs to the 
closure of [(1 — ko)! kt AV when n > 7. Thus, for n > 7, we see that 


W-TY = (WW yntt) + Ontt — Tyn) + (Tyn — TY) 
1 
€ 5V +ho(1 — ko) khAV 
eres 
Cc - -V=V 
e.3V 45 


Here V is an arbitrary element of Y. Since Y is Hausdorff, we conclude that 
w—Tw = 0; in other words, yf is a fixed element of 7. This establishes existence. 
The uniqueness of yy in S now follows from this standard argument: 

If y, z are fixed points of 7, then y = T"y and z = T"z foreveryn. If Ve V 
and A is chosen so that y — z € AV, then (6.31) applied to y and to z and their 
iterates gives 

y-z=T"y—T"zEkjaAv 


for every n. We may choose n so large that |kg4| < 1. Thus y — z € V for every 
V e V. Since Y is Hausdorff, we conclude that y — z = 0. 0 


In fact, it can be shown that ~w depends continuously on parameters. Such in- 
formation is crucial in many applications. We now formulate and prove such a 
result. Some preliminaries are required. We continue to use Notation 6.3.1, but 
extend it as follows: 
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Notation 6.3.4 Let Yo C Y as before, let Z be a locally convex topological vector 
space and let Zo € Z be any subset. Assume that f : Yo x Zo — F. Let 
W = {WalaweA be a neighborhood basis of 0 € Z. We let & denote some closed 
subset of Z which is contained in Zo. 

We fix a closed, convex subset § C Y as before, and begin with an initial 
approximate solution yo. Assume that there are numbers ko, k with O < kg < 1 
and 0 < k < 1 — kg and linear operators B: F + Y and A: Y — F, B having 
trivial null space, such that if y), y2 € S and y; — y2 € V € V, then 


BU fOn) — fQ2) —AOQi — y2)] € = ~ko(SoNV), (6.31) 
Bf(yo) € kSo, (6.32) 
BA = I. (6.33) 


We make the following standing hypothesis: 


H1: We assume that (6.31), (6.32), and (6.33) hold uniformly for any 
choice of z € & C Zp (we think here of z as a parameter and of y as 
the operative variable). Both operators A = A; and B = B, will, in 
general, depend on the variable z. 


By the theorem we have already established, for every z € & there is a unique 
wv = wW. € S such that f(W-,z) = 0. This w, is the fixed point of the map 
T, : § — S defined by 7,y = y — B, f(y, z). In other words, Yz = T, Wz with 
w, € Sandz e€ &. Lett: & > S be defined by f, = t(z) for z € & and 
wz € S. We will invoke the following additional hypothesis: 


H2: Given V € V there is some W € W such that if z), z2 € & with 
z1 — z2 € W andif y € S, then 


B., f(y, z1)— Ba f(y, zaEeV. (6.41) 


We also assume the analogous hypothesis that, given V € V and any compact 
subset C’ of Z, there is some W € W such that z1,z2 € DNC’, z} —z € W. 
y € S implies (6.41). 


Theorem 6.3.5 Under the hypotheses HI and H2, t : X — S is uniformly con- 
tinuous on 4. 


Proof. Let V € Vand Vo = 5(1 — ko) V. By Hz2, there is an element W € W 
such that z1, <2 € with z} — z2 € W, y € S implies 


B:, f(y, 21) — Bz f(y, 22) € Vo. 


We iteratively define 
Vint = Tz, Yin s 
forn =0,1,2,... andi = 1, 2, with y1,.9 = y2.0 = yo. Then 


yi = Tz, Yo = yo — Bz; f (yo, Zi)» ba, 2. 


6.3 The Implicit Function Theorem via the Newton-Raphson Method 133 


hence 
yi — y21 = —B:, f (yo, 21) + Bz f (yo, <2) € Vo- 
We next show that 


Yin — Yn = Ti yo — Tyo € tko tes +k"), n=1,2,.... 
(6.42) 


This assertion is certainly true for n = 1. Let us assume that (6.42) is true for 
1,2,..., and then prove it for 2 + 1. In fact 


Vint! — Yang = —Bzy f(a 21) + Bz, f(y2,n) 22) + Yin — Y2n 
= -B:,, [fOyin» 21) — fQ20. 21) - Az, (Yin — Y2,n)] 
— Bz, f (yans 21) + Bes f (Y2,ns Z2) - 
As a result, 


Yintl — Yan¢1 € Kol + ko +--+ +kG7')Vo + Vo = (1+ ko +++ +e) Vo. 


Inductively, (6.42) is now proved for every 71. 

As n — 00, we find that w:, — Wz, belongs to the closure of (1 — ko)! Vo, 
hence to the closure of ty Thus Wz, — W:, certainly belongs to V. In summary, 
we have proved this: Given V € Y, there is a W € W such that if z}, z2 € E& and 
<1 — 22 € W then w;, — Wz, € V. Our result is thus proved. O 


In case the spaces are actually Banach spaces, the proof becomes more stream- 
lined (see, for example Leach [Le 61), and, in fact, one can prove a more refined 
result. We state here, without elaboration, one such theorem. We begin with some 
notation. 

Let U, V be Banach spaces. Let f : U — V bea function. A strong differential 
of the function f at a point x9 € U is a bounded linear transformation a : U + V 
which approximates the change in f in the following strict sense: For every € > O 
there is a5 > O such that if x’ and x” satisfy ||x’|] < 6 and ||x”|| < 6 then 


WAG’) — FX") — (x! — x”) |] < €llx’ — x". 
Now we have: 


Theorem 6.3.6 Let U, V, f be as above. Assume that f (0) = 0 and that f has 
a strong differential « at 0. Let B : V — U be a bounded linear transformation 
such that BaB = B. Then there is a function g : V — U such that g(0) = 0, g 
has strong differential B at 0, and g satisfies (for y near 0) the identities: 


(1) BCF(g(y))) = BO): 
(2) B(a(g(y))) = gQ); 
(3) g(f(B(y))) = BO): 


Any two functions g satisfying these three conditions are identical for y near 0. 
It is noteworthy that this new theorem does not mandate continuous differentia- 


bility as in the classical implicit function theorem. In fact, we only require dif- 
ferentiability at a single point! Observe also that the invertibility condition on the 
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derivative at this single point is rather weaker than the classical condition. We 
shall say no more about Leach’s result at this time. 

A technical result of the type we have been considering is best understood by 
way of an example, and it turns out that an entire genre of examples is now easy 
to describe. 


Example 6.3.7 Let U, V each be the same standard function space: the Lipschitz- 
a functions or the C*” functions or the Sobolev-s functions or a Besov space or 
LP space. Let f : U — V be a mapping such that f(0) = O and suppose 
that f has a strong differential @ at 0. Further suppose that @ is invertible. Then 
the theorem tells us that f has a local inverse near 0. Note in particular that if 
f : RN — RY" is C**, in the classical sense with k > 1, at the origin and if 
the ordinary Jacobian of f at 0 is invertible (also in the classical sense) then the 
mapping T : C“ 5 y go f is a bounded mapping of C*” to C and has 
a strong differential at 0 which is invertible in the sense of our Theorem 6.3.6. As 
a result, the mapping 7 is locally invertible, which méans that f is invertible in 
the category of C*”. A similar analysis may be applied in Sobolev spaces and the 
other spaces we have mentioned. O 


The example shows that Theorem 6.3.6 gives us an inverse function theorem 
(and hence, by our standard syllogism) an implicit function theorem, in any of 
our familiar Banach spaces of functions. We have seen in Sections 2.4 and 6.1 
that there are also implicit function theorems in the real analytic and holomorphic 
categories. Such theorems can be quite useful in practice, and are not so easy to 
pry Out of the literature. 


6.4 The Nash—Moser Implicit Function Theorem 


6.4.1 Introductory Remarks 


A big open problem during the last quarter of the 19"" century and the first half of 
the 20" century was the question of whether an arbitrary Riemannian manifold 
can be isometrically imbedded in Euclidean space.* Of course the celebrated the- 
orem of Hassler Whitney [Wh 35] tells us that the manifold can be imbedded as 
a smooth “surface”; it does nor tell us that the metric will be preserved under the 
imbedding. It is the metric-preserving issue that Nash’s theorem addresses. 

The crucial technical tool that the proof of the Nash theorem showcases is a 
new type of implicit function theorem. Recall that our functional analysis proof 
of the classical implicit function theorem (Theorem 3.4.10) used the contraction 
mapping fixed point theorem; and that fixed point theorem is proved by iteration. 
Just so, the Nash theorem is proved by an iteration procedure. In point of fact, the 


4The introduction of Nash (Na 56] discusses the history of the imbedding problem-for Riemannian 
manifolds. 
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proof is modeled on Newton’s method from calculus. But what is special about 
Nash's argument is that there is a very clever “smoothing” that takes place at each 
stage of the iteration. 

Nash’s paper [Na 56] is a tour de force of mathematical analysis. In the in- 
troduction, Nash observed that the perturbation procedure used in the paper did 
not seem limited to just the imbedding problem, and that view proved correct 
when Jiirgen Moser (in Moser [Mo 61]) isolated an implicit function theorem 
from Nash's celebrated paper and presented it as a succinct tool of wide applica- 
bility. As a result, the theorem is known today as the Nash—Moser implicit func- 
tion theorem. It is part of the standard toolkit of geometric analysis and nonlinear 
partial differential equations. 

In this book we shall formulate and prove a version of the Nash—Moser theo- 
rem. It would take us too far afield to attempt to discuss the isometric imbedding 
of Riemannian manifolds. We refer the interested reader to Nash [Na 56] and 
Schwartz [Sc 69]. The reader may also be interested in the extensive survey of 
Hamilton [Ha 82], the notes of Hérmander [Ho 77), and the expositon by Saint 
Raymond [SR 89]. Our presentation is based on that in Schwartz [Sc 69]. 


6.4.2. Enunciation of the Nash-Moser Theorem 


Notation 6.4.1 For @ a nonnegative integer, we define C£(R™) to be the space of 
£-times continuously differentiable functions on RY We equip C£(R) with the 
norms 


Weel, = feller = max sup |]D°u(x)I, 
la|<r xERN 


where r < &. Clearly, if r < p, then [ju], < {lel}, holds. 
We can now state the theorem. 


Theorem 6.4.2 Set P = 61. Let BX be the unit ball in the space C‘(R™) of 
k-times continuously differentiable functions on R. Suppose that T : BE + 
ck—™ (RY) for some 0 < m < k. We make the following technical hypotheses: 


(1) The function T has two continuous Fréchet derivatives (see 3.4.7), both 
bounded by a constant M > 1; 


(2) There is a map L with domain B* and range the space 
£L(ck(R’), ck-mn (R’)) 


of bounded linear operators? fromC K(R) to Ck—-™ (RY) such that 


SIn the arguments that follow, we will apply this defining property of L for various values of k. 
Thus we are thinking of L as taking values in a space of pseudodifferential operators, which map ck 


to Ck—™ for every k. 
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(2a) WLw)Alle-m < Mille Va € BY, h Ee C*RY) ; 
(2b) DT(w)Lwh=h WeBlheck ,; 
(2c) WLW) T(t) legom < M+ fullesiom) = Yu e CRP 


Conclusion. If 
7 (O)ileaom < 277 M~>P-? , (6.43) 


then T (B*) contains the origin. 


This is a new type of theorem, different from our earlier implicit function the- 
orems, and its statement bears some discussion. 

First, we hypothesize bounds on both the first and the second derivative of the 
function 7 being studied. 

Second, the function L plays the role of the inverse of the derivative of 7, 
although this property is not formulated explicitly. 

Third, the hypothesis (2b) of Theorem 6.4.2 is the familiar one about the in- 
vertibility of the “derivative” L. 

Fourth, hypotheses (2a) and (2c) of Theorem 6.4.2 are new. These are the 
smoothness hypotheses that give us leverage in the iteration scheme. 

And now let us look at the conclusion. It says that the image of the operator 
contains the origin. How does this relate to our more familiar notion of solving 
for a variable, or of inverting a mapping? Of course the answer is that there is 
nothing special about the origin. A simple topology/logic argument shows that 
the image contains a neighborhood of the origin, and this statement in turn tells 
us that each element near the origin has a preimage under 7 And of course that 
assertion amounts to solving for one variable in terms of another. 


6.4.3 First Step of the Proof of Nash-Moser 


The chief technical device that will be needed in the proof of the Nash—Moser 
theorem is the type of iteration illustrated in the following proposition. 


Proposition 6.4.3 Let X be a Banach space, and suppose that T is a mapping 
whose domain is the unit ball B in X. Let us assume that 


(A) The operator T has two continuous Fréchet derivatives in B, both bounded 
by a constant M. For technical reasons, we assume that M > 2. 


(B) There is a map L with domain B and range the space L(X, X) of bounded 
linear mappings of X to itself and having the properties 


(B1) LU Al] < MIjAl Vhe X,u EB; 
(B2) DT(u)L(wh =h Whe Xu eB. 


Conclusion. /f ||7 (0)|| < M~>, then it follows that T(B) contains the origin. 
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Remark 6.4.4 Just as the contraction mapping fixed point theorem was a paradigm 
for the classical implicit function theorem, so this proposition is a paradigm for 
the Nash—Moser theorem that we will prove a bit later. 


Proof. Set 


A= : .  B=(8/3)logM (6.44) 


It is in this argument that we wil] utilize the first iteration scheme. Set ug = 0. 
Put 


Unt) = Un — L(un)T (un). (6.45) 
We will prove inductively that 
un-1 €B Pi (n) 
and 
lluen — Un—il] < eO P2[n] 


hold for alln > 1. 
Condition P;[1) is trivial. Condition P2[1]) simply says that 


1L(0)7 (0) < ePA (6.46) 


and this in turn will be implied by M{|7(0)]] < e—P% We know that |/7(0)]] < 
M~—5, so it suffices to show M~*4 < e~F or, equivalently, BA < 4 log M, and the 
latter follows from (6.44). 

Arguing by induction, we suppose that P [j] and P2[/] are true for all j <n. 
Then using P2[j] (and the inequality (3/2)/ > j/2 that we get from the binomial 
theorem), we estimate 


n CO F 

—pr 

enll << > tej —ejall s > e?* 
j=! j=l 


A 


—B(A-1) 
eBO-Dj < © 


a Ec 


Me 


m. 
ll 


where the last inequality holds because (6.44) implies 2log2 < B. It follows that 
P;(n + 1) holds. As a result, the specification (6.45) makes sense. 

Now if g is any twice continuously Fréchet differentiable function then the 
mean-value theorem with Lagrange’s remainder term may be applied to g(u+th) 
to yield (with d*g denoting the second Fréchet derivative of g) 


1 
g(uth) =ge(u)+dg(ujh + [ (1 —1)d2g(u+th,h,h) dt (6.47) 
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We apply the induction hypothesis, together with the estimates in (B), to obtain 


lent — Uni) = L(un)T adi 
< MilT(un)Il 
< MilT(un-1) - DT (un—1)L(un-1)T (un-1) |] 


4 9 
+M*|lun — Un-ill” 
2 2 
M*|lun — Un-1 i 


M2e—76 ” 


Notice that, in the last equality, we used (B2) with h = T (un—-1). 
By (6.44) we have M? = e78/4, so 


M2e72BM" = eB(B/4)-22") << gB((1/2)4"- 20") __ g—BAT 
which proves P2[n + 1). 


It is easy to see from P2[n], 2 = 1,2,..., that {u,} is a Cauchy sequence, so 
u, converges to some element u € B as n —> 00. By (B2) and (6.45), we have 


T (un) = DT (un) (Un — Unt) 
hence, by P2[ + 1), 
—RA ntl 
NT (unl < Mllngt — unl < MeWPR™ 
Therefore, letting n —> 0, we see that T(u) = 0. O 


6.4.4 The Crux of the Matter 


We begin with some technical definitions and terminology. 


For a constant M sufficiently large, we define a family of smooth- 
ing operators S(t) acting on C‘—" functions and producing C*+!0m 
functions; these will have the following properties: 


(A) WS@)ullb < Mt? full), weer 


(BY) d-—S@))ull, < Mt"? lull, u ec? 


where 
k—-m<r<p<k+10m. 


Note that taking r = k—m and p = k+ 10m in (A) justifies calling S a smoothing 
operator. We shall later construct a family of smoothing operators using standard 
tricks from mathematical analysis. For now we simply assume that such a family 
exists. 


Proof of the Nash—Moser Theorem 6.4.2. Set 


3 9 8 
A==, p= >, B=—log(2M?). (6.48) 
2 4 m 
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Again, we use an iteration scheme. Let ug = 0. Define 
S, = S(eP"), (6.49) 
Unt) = Un — SpL(un)T (un). (6.50) 


We will prove by induction that 


uy € BS, Qi[n) 
llitn — tn—i lle < eh" , Q2[n] 

and 
1+ fletnlleriom < e™* Q3[n] 


hold forn = 1,2,... 
Condition Q;[1] is trivial. Notice that, by (6.50), (6.49), (A) with p = k and 
r =k — m, (2a) of Theorem 6.4.2, and (6.43), it holds that 


lIey —uolle = [SoL(0)T (0) IIx 
ll S(e®) L(0)T (0) fe 


< Me® |L(O)T(O)Ik—m 

< M*e"® IT(O)Ik 

< Me"? ITO) Irom 

< 27-P mS? eb — exp (mp ap log(2M°)) <1, 


which is just Q2[1). Similarly, it holds that 


1+ flr teetom = 1+ SoL(O)T (0) ilc+10m 
M e''™8 1 L(0)T (0) lk—m 
M? e!!™® IT (ODI 

M? e!!™8 IT (0) Hletom 
Q-P MP el Imp 


IA IA IA 


lA 


As aresult, we have 


Q-P mM—5P e(ll—HA)mB 


lA 


(1+ eer Her-t0m)e He 


exp (EmB ~ P log(2M*)) <1, 


which gives us Q3[1]. _ 
Now, assume that Q2[/] is true for j <n. Then using AJ > (A-—1)j,j = 
1,2,..., we estimate 
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A 


n 
Henle << >) ey — ej—alle 
j=l 


00 
< > e7 bmp 
j=l 
fore) —pmB(A—1) 
—wmBQ-1I)j — 8 
Ss we =F so = 


j=l 


where the last inequality holds because log 2 < ymB(A — 1). Thus, Q) [1] holds. 

Next, suppose that Q)[/], Q2[/], Q3[/] are true for j <n. Then using (6.50), 
(A), (2a) of Theorem 6.4.2, (6.47) as in the proof of Proposition 6.4.3, Q2[r], the 
bound on DT, (B), and (2c) of Theorem 6.4.2, we estimate 


eens — Unlle 


IA IA 


lA 


lA 


lA 


IA 


IA 


IA 


| Sn L(y )T (un) lle 
Me™ Wh (ttn)T (un) Ie—m 
M7 IT (uni 


Membr" 7 (up»—-1) — DT (ty—1)Sn—1 L (Un—1)T (Un—1) Mle 
4+-M3embr llitn — Un—1 liz 


M2embx" | DT (un—1)(1 — Sn-1)L(un—1)T (un—1) He 
4 m3 eB" ea empr" 


M3e™P UT — Sy—1)L(tn—1)T (ttn—1) Mlk 
+ m3 eltBr” e72hmpr" 


MP emer | ate-omn NL (n—1)T (ln—1) lle 9m +e 2H" 
MP emer eee M*(1 + flitn—t e--10m) + eal 
M3emba" M2 e7ombx" | pumpar! 4 e722" 

m3 Memb" "(u—94) a me 


M3 ( M24 1) e621 /4)mpr"~! 


2Mie~2N/4mpar! ea pmpr*! 
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where the last inequality is true because 
3 
log(2M>) < 7¢"b < A"! mp[(21/4) — pr2). 


To prove Q3[n + 1), we observe by (6.50), (A), (2c) of Theorem 6.4.2, and 
Q3[/), j = 1,2,...,n, that 


n 
V+ flensitlertom < 1+ S> WSjL(uj)T (uj) lert0m 
j=0 
j 
< 14M) 0c"! yh (uj)T (uj eo 
j=0 
4 a J 
< 14M?) ec" (1+ ju jlertom) 
j=0 
a 
< 1+ mM?) ~ emB(+p)a/ 
j=0 
As aresult, we have 
—pumpi"t! 


(1+ flent1Me+tome 
n—1 

< ga Hmpa"*! 4 M2 eB" (1+p—par) a mM?) > oBl (+p) —prnthy 
j=0 


co 
en Hp + M%e7(1/8)mpx" zs My ob (1+ p—pd?) 
j=0 


lA 


[o.@) 
= ea ump? 4 M2e7(1/8)mBr 4 M?)~ eo (29/16)mpaJ 
j=0 


fore) 
e—(81/16)mp 4 M2¢e—G/16)mB 4 M2 er 
j=0 


lA 


—(29/16)mB(1—A) 
S81)? ay 2 oppose he ek 
(2M + MOM + M"—n0emboa) 


(2M>)-29/4 
1-—(2 M5)-29/4 


Thus Q3[n + 1] follows, and our induction is complete. 
Finally, the proof concludes as in that of Proposition 6.4.3. 0 


= (2M>)—81/2 4 M?(2M>)~3/2 + M2 <1. 


6.4.5 Construction of the Smoothing Operators 


We shall construct the requisite smoothing operators on Euclidean space. Let @ be 
a C™ function with compact support in RY, supported in B(O, 1) and identically 
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equal to 1 near the origin. Let a be the inverse Fourier transform of @. That is, 
a(x) =cn [ el* 5 GE) dé. 
RY 
Then of course |a(x)| < C and, integrating by parts, we see that 
1 Ix nv 
la(x)} =|— Je? D’al&) d& 

xY RNY 

for any multiindex y. As a result, 
lJa(x)| < Cy, (1+ Ix). 

The same calculation may be performed on any derivative of a. We find that 


|D%a(x)| < Co,n(1 + Ix1)* . 


It also holds that 
[acnas =a(0) = 1 


and 
[ Pacyas = (-1)”- D’a@(0) = 0 


for all multiindices y with |y| > 0. 
Now we define 


[S¢e)u)(x) = 0 i alt(x — y)u(y) dy. 
RY 


This is a standard convolution construction, commonly used in the theory of par- 
tial differential equations and harmonic analysis (see, for example, Krantz [Kr 99]). 
Let us first establish property (A) of smoothing operators when r = 0. We need 


to show that 


WS()ull> < Mt*|lullo. (6.51) 


If ja] < p then we calculate that 


|D°S(@)ullo 


lol, [ Deattox— yuty)ab| 
RN 


< lett [ C+ fuly)Il dy 
RYN 

< Mtlhutio 

< Mr? l\lullo. 


That establishes (A) when r = 0. Now a particular case of (A) when r = O is the 


case when p is replaced by p — r and also r = 0. Thus 


WS@)ullp-, < Mt? |lullo. 
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Let @ be any multi-index with || < r. Then 
WD°S()ullp-- = S(t)D%ullp—+ 

Mt?" 1D" ullo 

Mt?" jul, . 


IA IA 


Since this is true for all such a, we have 
WS(e)ull, < Mee flu, - 
We tackle (B) in the same way. The case r = 0 reduces to 
(1 — S(t))ullo < Me~? ull - 
To establish this inequality, we apply Taylor’s theorem with remainder: 


m—-1 o(k 1 
¢ 0) Oe 1 — ,,ym—1 2 (m) 
= toy f a-wmtemanan. 


We let (t) = u(x + oe and obtain 


aa | 
uxty) = oe AZ. ve) 


xa * Ja|=k 


-+-——_—__ 
ee Ti a: 


Thus we have 


u(x) — S(t)u(x) u(x) — 0% I, a(t(x — y))u(y) dy 
= u(x) —¢t% [ a(ty)u(x + y)dy 
RY 
a nf a(ty)u(x) dy 
RY 
- wf a(ty)u(x + y)dy 
RY 


a he a(ty)[u(x) — u(x + y)) dy 
ae 


(Taylor) ty)\(1 — w)e7! y@ 
5 > ly A a(ty)( =p) 


x D®*u(x + py) dpdy. 
The change of variable ry = z yields 


1 
ef [ auyyc — myprty"Dtutx + wy dd dy 
RY JO 


aa I a(z)(1 = p)P 12" D® u(x + pt!z)dudz. 
RY Jo 


rf (1 — 2)! D*u(x + py) dp. 
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But now straightforward estimates give the desired conclusion, and (B) is es- 
tablished. 


6.4.6 A Useful Corollary 


We now derive a consequence of Theorem 6.4.2 that is useful in practice (and 
which, incidentally, is formulated more like a classical implicit function theorem). 


Notation 6.4.5 For z € R% define the translation t, : RN + RY by setting 
T(x) =x+Z. 


Theorem 6.4.6 Let T be a mapping from the unit ball BF of C*(R") into 
C*-8(R"). Assume that T (0) = 0. Further suppose that 


(1) T has infinitely many continuous Fréchet derivatives. 
(2) T is translation invariant in the sense that ifu € C K with lull. < 1 then 
T(uot-) =[T(u)) oT; 
holds for allz € RN. 


(3) There is a mapping L defined on B* with values in the space of bounded 
linear operators from C* to C*-* such that L(u) is translation invariant in 
the same sense as T (in part (2)), and such that L(u) has infinitely many 
continuous Fréchet derivatives, and finally such that 


(3a) WLahie-p < Mihi, Wee Ce, heck, 
(3b) DT (u)L(uyh h Vu,h eck 


Conclusion: The set 7 (B*) contains a neighborhood of the origin. 


Proof. Since 7 is translation invariant, it commutes with derivatives. Thus if we 
apply T toa function in C”,m > k, then we obtain a function in C”’~*. Similarly, 
L(u) can be considered to be a function whose domain is the unit ball of C” and 
with range CF" We also have the relations 


(3a)" Wwhlin-g < Mihi, Wee C™,hec”, 
DT (u)L(ujh h Vu,he cuts 


Now we wish to apply the Nash—Moser Theorem 6.4.2. In order to do so, we 
must verify part (2c) of its statement. This follows from the translation-invariance 
of 7 and L together with part (3a) of the present theorem and the boundedness of 
the derivatives of T. The result of applying the Nash—Moser theorem is that if a 
point x is sufficiently near to 0 in C*—F, then there is a point in C* whose image 
is x. Therefore 7(B*) contains a C‘—-neighborhood of the origin, and hence 
certainly it contains a C® neighborhood as well. Oo 


Glossary 


absorbent set Let Y be a Hausdorff, locally convex, topological vector space. A 
set A C ¥ is absorbent if Y = U,, of A. 


anomaly The angle between the direction to an orbiting body and the direction 
to its last perihelion. 


Axiom of Choice A fundamental axiom of set theory that specifies the existence 
of choice functions. 


balanced set A set B is balanced if cB C B holds for all scalars ¢ with Jc} < 1. 


Banach space A normed linear space that is complete in the topology induced 
by the norm. 


Besov space A space of functions in which smoothness is measured by certain 
p"-power integral expressions. 


Borel measurable function A function f with the property that f—'(U) is a 
Borel set when U is an open set. 


Borel set A set in the o-algebra generated by the open sets. 


Cauchy estimates Certain inequalities in complex variable theory that allow es- 
timates of the derivatives of a holomorphic function in terms of the maxi- 
mum size of the holomorphic function. 


Cauchy—Kowalewsky theorem A general theorem about the existence and unique- 
ness of real analytic solutions of partial differential equations with real an- 
alytic data. 
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Cauchy—Riemann equations A pair of linear partial differential equations that 
characterize holomorphic functions. 


complex analytic function A function of a complex variable that has a conver- 
gent power series expansion (in powers Of z) about each point of its domain. 


continuation method A method for solving nonlinear equations by deforming 
the given equation to a simpler equation. 


contraction A mapping F of a metric space (X,d) with the property that that 
there is a constant 0 < c < 1 such that d(F(x), F(y)) < c- d(x, y) forall 
x,yEXx. 


contraction mapping fixed point principle A theorem that specifies that a con- 
traction on a complete metric space will have a fixed point. 


convex homotopy A homotopy between functions F and Fo given by the for- 
mula H(t, x) = (1 —t)Fo(x) +tF(x). 


decomposition theorem The result that any smooth mapping may be written as 
the composition of primitive mappings and linear operators which are either 
the identity or which exchange two coordinates. 


differentiability in a Banach space See Fréchet differentiability or Gateaux dif- 
ferentiability. 


Dini’s inductive proof of the implicit function theorem A proof of the implicit 
function theorem that proceeds by induction on the number of dependent 
variables. 


distance function In a metric space, the function that specifies the distance be- 
tween any two points. 


division ring A ring A such that 1 4 0 and such that every non-zero element is 
invertible. 


eccentric anomaly The quantity E in Kepler’s equation E = M + esin(E), 
where M is the mean anomaly and e is the eccentricity. 


eccentricity A numerical parameter e that specifies the shape of an ellipse. 
Euler step One iteration in Euler’s method. 

explicit function A function that is given by a formula of the form y = f(x). 
Fréchet derivative A form of the derivative in a Banach space. 


function Let X, Y be sets. A function from X to Y is a subset f of X x Y such 
that (i) for each x € X there is a y € Y such that (x, y) € f and (ii) if 
(x,y) € f and (x, y’) € f then y = y’ 
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Fundamental Theorem of Algebra The theorem that guarantees that any poly- 


nomial of degree at least one, and having complex coefficients, will have a 
complex root. 


Gateaux derivative A form of the directional derivative in a Banach space. 


generalized distance function A modified distance function that is smoother than 
the canonical Euclidean distance function. 


global homotopy A homotopy between a function F and the shifted function 
F — F(xo) given by H(t,x) = F(x) — (1 — t) F(x). 


global inverse function theorem See Hadamard’s theorem. 


Hadamard’s formula A formula for the radius of convergence of a power series 
that is determined by the root test. 


Hadamard’s theorem A global form of the inverse function theorem, i.e., one 
that yields a global rather than a local inverse. 


holomorphic function A function of a complex variable that has a complex deriva- 
tive at each point of its domain. Equivalently, a function with a complex 
power series expansion about each point of its domain. Equivalently, a func- 
tion that satisfies the Cauchy—Riemann equations. 


homotopy method See continuation method. 
homotopy A continuous deformation of curves in a topological space. 
imbedding method See continuation method. 


implicit differentiation A method for differentiating a function that is given im- 
plicitly. 


implicit function theorem paradigm A general conceptual framework for im- 
plicit function theorems. 


implicit function theorem A theorem that gives sufficient conditions on an equa- 
tion for the solving for some of the variables in terms of the others. 


implicit function A function that is given by an equation, but not explicitly. 


inverse function theorem A theorem that gives sufficient conditions for the local 
invertibility of a mapping. 


Jacobian determinant The determinant of the Jacobian matrix. If the Jacobian 
matrix is DG, then the Jacobian determinant is denoted by det DG. 


Jacobian matrix The matrix of first partial derivatives of a mapping G, denoted 
by DG. 
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Kepler’s equation An equation that relates the mean anomaly and the eccentric 
anomaly of a planetary orbit. 


Lagrange expansion A power series expansion used to evaluate an inverse func- 
tion. 


Lagrange inversion theorem See Lagrange expansion. 


Legendre transformation A change of coordinates that puts the Hamiltonian in 
a normalized form. 


Lipschitz mapping A mapping F ona metric space (X, d) with the property that 
d(F(x), F(y)) < C-d(, y). 


Lipschitz space A space of Lipschitz functions. 


local coordinates A method of specifying Euclidean-like coordinates in a neigh- 
borhood on a manifold. 


majorant A power series whose coefficients bound above the moduli of the cor- 
responding coefficients of another power series. 


manifold A topological space that is locally homeomorphic to some Euclidean 
space. 


maximum modulus theorem The theorem that says that a holomorphic func- 
tion will never assume a local absolute maximum value in the interior of a 
domain. 


mean anomaly See anomaly and eccentric anomaly. 


method of majorants The technique of using majorants to prove convergence of 
a power series. 


metric space A space in which there is a notion of distance satisfying certain 
standard axioms. 


Nash-Moser implicit function theorem A sophisticated implicit function theo- 
rem proved by a complicated scheme of iteration and smoothing. 


Newton diagram A graphical device for determining the qualitative behavior of 
a locus of points in the plane. 


Newton polygon See Newton diagram. 
Newton’s method An iterative method for finding the roots of a function. 
Newton—-Raphson formula The iterative formula that occurs in Newton’s method. 


parametrization of a surface A method of assigning coordinates to a surface by 
means of a local mapping from Euclidean space. 
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~ HISTORY, THEORY, AND APPLICATIONS 


Steven G. Krantz and Harold R. Parks 


The implicit function theorem is part of the bedrock of mathematical analysis 
and geometry. Finding its genesis in eighteenth century studies of real analytic 
functions and mechanics, the implicit and inverse function theorems have now 
blossomed into powerful tools in the theories of partial differential equations, 
differential geometry, and geometric analysis. 


There are many different forms of the implicit function theorem, including (i) 
the classical formulation for Ck functions, (ii) formulations in other function 
spaces, (iii) formulations for non-smooth functions, (iv) formulations for 
functions with degenerate Jacobian. Particularly powerful implicit function 
theorems, such as the Nash—Moser theorem, have been developed for specific 
applications (e.g., the imbedding of Riemannian manifolds). All of these topics, 
and many more, are treated in the present volume. 


The history of the implicit function theorem is a lively and complex story, and 
is intimately bound up with the development of fundamental ideas in analysis 
and geometry. This entire development, together with mathematical examples 
and proofs, is recounted for the first time here. It is an exciting tale, and it 
continues to evolve. 


The Implicit Function Theorem is an accessible and thorough treatment of 
implicit and inverse function theorems and their applications. It will be of 
interest to mathematicians, graduate/advanced undergraduate students, and 
to those who apply mathematics. The book unifies disparate ideas that have 
played an important role in modern mathematics. It serves to document and 
place in context a substantial body of mathematical ideas. 
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